-
- Downloads
Non-channel/filter block pruning (#119)
Block pruning: support specifying the block shape from the YAML file Block pruning refers to pruning 4-D structures of a specific shape. This is a why it is sometimes called structure-pruning or group-pruning (confusing, I know). A specific example of block pruning is filter or channel pruning, which have a highly-regular block shape. This commit adds support for pruning blocks/groups/structures that have irregular shapes that accelerate inference on a specific hardware platform. You can read more about the regularity of shapes in (Exploring the Regularity of Sparse Structure in Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf]. When we want to introduce sparsity in order to reduce the compute load of a certain layer, we need to understand how the HW and SW perform the layer's operation, and how this operation is vectorized. Then we can induce sparsity to match the vector shape. For example, Intel AVX-512 are SIMD instructions that apply the same instruction (Single Instruction) on a vector of inputs (Multiple Data). The following single instruction performs an element-wise multiplication of two 16 32-bit element vectors: __m256i result = __mm256_mul_epi32(vec_a, vec_b); If either vec_a or vec_b are partially sparse, we still need to perform the multiplication operation and the sparsity does not help reduce the cost (power, latency) of computation. However, if either vec_a or vec_b contain only zeros then we can eliminate entirely the instruction. In this case, we say that we would like to have group sparsity of 16-elements. I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements. Things are a bit more involved because we also need to understand how the software maps layer operations to hardware. For example, a 3x3 convolution can be computed as a direct-convolution, as a matrix multiply operation, or as a Winograd matrix operation (to name a few ways of computation). These low-level operations are then mapped to SIMD instructions. Finally, the low-level SW needs to support a block-sparse storage-format for weight tensors (see for example: http://www.netlib.org/linalg/html_templates/node90.html)
Showing
- distiller/pruning/automated_gradual_pruner.py 3 additions, 3 deletionsdistiller/pruning/automated_gradual_pruner.py
- distiller/pruning/ranked_structures_pruner.py 98 additions, 4 deletionsdistiller/pruning/ranked_structures_pruner.py
- distiller/utils.py 45 additions, 1 deletiondistiller/utils.py
- examples/agp-pruning/resnet50.schedule_agp.1x1x8-blocks.yaml 147 additions, 0 deletionsexamples/agp-pruning/resnet50.schedule_agp.1x1x8-blocks.yaml
Loading
Please register or sign in to comment