Formatting updates to Approx Imp Doc

7af842b0 · Hashim Sharif · 01f43b47 · 7af842b0
Commit 7af842b0 authored 4 years ago by Hashim Sharif
--- a/hpvm/docs/developerdocs/approximation-implementation.rst
+++ b/hpvm/docs/developerdocs/approximation-implementation.rst
 Approximate Algorithm Implementations
 =========================================
+Perforated Convolutions
+-----------------------
+Overview
+^^^^^^^^^
+Perforation approximation for convolution operation entails, perforating rows/columns of tensors i.e. skipping computing values of rows/columns of tensors and using the neighboring values to interpolate the skipped ones to recover the accuracy and shape of the resultant tensor. This helps reduce the number MAC operations performed while improving cache and memory bandwidth usage. Perforation is performed at a uniform rate, which is the percentage of rows/columns perforated in relation to total number of rows/columns. Perforation is performed starting from an offset, which is the index of the row/columns from where perforation is performed and rows/columns prior to that index remain unapproximated.
+Description
+^^^^^^^^^^^
+The algorithm for perforated convolution can be broken down into three major steps:
+* **Patch matrix creation:** Based on indices of the rows/columns to be perforated, the corresponding elements of the input tensor are used to create a new matrix called an input-patch matrix. The input-patch matrix is a matrix laid out in memory in such a way that convolution is then reduced down to a simple matrix multiplication operation. This approach is similar to one described in this paper.
+* **Dense matrix multiplication:** This step involves performing a matrix multiplication in a manner very similar to described in this paper. It is important to note that it is performed on reduced, dense matrices.
+* **Interpolation of missing values:** This step entails allocation of a new tensor to which computed elements from the reduced, dense tensor are copied and the elements whose computation was skipped are interpolated by taking the arithmetic mean of the neighboring elements. These neighboring elements constitute the computed elements on the right and the left of the skipped element in case of column perforation;  and the computed elements above and below the skipped element in case of row perforation.
+Filter Sampling
+---------------
+Overview
+^^^^^^^^^
+Convolution performed with filter sampling approximation constitutes performing convolution operation using a sampled filter i.e. a filter with missing elements. This helps reduce the number MAC operations performed while improving cache and memory bandwidth usage by reducing overall filter size. Filter sampling is performed at a rate, which is the percentage of elements sampled in relation to total number of elements in a tensor. Sampling is performed starting from an offset into a filter, which is the index of the filter element at which sampling begins - i.e. filter elements prior to this index are not skipped/sampled.
+Description
+^^^^^^^^^^^
+The algorithm for convolution using a reduced filter is implemented in three major steps:
+* **Creation of sampled filter:** This step entails allocation of a new sampled filter whose size is based on the sampling rate and offset. The appropriate elements of the original filter are scaled up by the factor of  rate / (rate - 1) and copied to the newly allocated reduced fiter. Scaling up of filter elements helps make up for the lost accuracy from sampling the filter.
+* **Patch matrix creation:** Based on indices of the elements of the original filter that go into making the sampled filter, the corresponding elements of the input tensor are used to create a new matrix called an input-patch matrix. The input-patch matrix is a matrix laid out in memory in such a way that convolution is then reduced down to a simple matrix multiplication operation. 
+* **Dense matrix multiplication:** This step involves performing a matrix multiplication on the (sampled) filter and input patch matrices.