- Nov 16, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
-
- Nov 11, 2019
-
-
Neta Zmora authored
* pruning: add an option to virtually fold BN into Conv2D for ranking PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph). Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods. We attenuate using the running values of the mean and variance, as is done in quantization. This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported). e.g.: policies: - pruner: instance_name : low_pruner args: fold_batchnorm: True starting_epoch: 0 ending_epoch: 30 frequency: 2 * AGP: non-functional refactoring distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`, which is a more appropriate function name as we don’t really prune in this function * Simplify GEMM weights input-channel ranking logic Ranking weight-matrices by input channels is similar to ranking 4D Conv weights by input channels, so there is no need for duplicate logic. distiller/pruning/ranked_structures_pruner.py -change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`, which is a more appropriate function name as we don’t really prune in this function -remove the code handling ranking of matrix rows distiller/norms.py – remove rank_cols. distiller/thresholding.py – in expand_binary_map treat `channels` group_type the same as the `cols` group_type when dealing with 2D weights * AGP: add example of ranking filters with virtual BN-folding Also update resnet20 AGP examples
-
- Oct 23, 2019
-
-
Neta Zmora authored
-
- Sep 27, 2019
-
-
Neta Zmora authored
-
- Apr 01, 2019
-
-
Bar authored
-
Bar authored
Load optimizer from checkpoint (BREAKING - see details) (#182) * Fixes issues #70, #145 and replaces PR #74 * checkpoint.py * save_checkpoint will now save the optimizer type in addition to its state * load_checkpoint will now instantiate an optimizer based on the saved type and load its state * config.py: file/dict_config now accept the resumed epoch to pass to LR schedulers * policy.py: LRPolicy now passes the current epoch to the LR scheduler * Classifier compression sample * New flag '--resume-from' for properly resuming a saved training session, inc. optimizer state and epoch # * Flag '--reset-optimizer' added to allow discarding of a loaded optimizer. * BREAKING: * Previous flag '--resume' is deprecated and is mapped to '--resume-from' + '--reset-optimizer'. * But, old resuming behavior had an inconsistency where the epoch count would continue from the saved epoch, but the LR scheduler was setup as if we were starting from epoch 0. * Using '--resume-from' + '--reset-optimizer' now will simply RESET the epoch count to 0 for the whole environment. * This means that scheduling configurations (in YAML or code) which assumed use of '--resume' might need to be changed to reflect the fact that the epoch count now starts from 0 * All relevant YAML files under 'examples' modified to reflect this change * Initial support for ReduceLROnPlateau (#161): * Allow passing **kwargs to policies via the scheduler * Image classification now passes the validation loss to the scheduler, to be used yo ReduceLROnPlateau * The current implementation is experimental and subject to change
-
- Mar 23, 2019
-
-
Neta Zmora authored
This schedule demonstrates high-rate element-wise pruning (84.6% sparsity) of Resnet 50. Top1 is 75.66 vs the published Top1: 76.15 i.e. a drop of -0.5%.
-
Neta Zmora authored
This is an improved ResNet50 AGP schedule which generates a ResNet50 network that is 80% element-wise sparse, with statistically insignificant drop in Top1 accuracy (-0.13%).
-
- Feb 10, 2019
-
-
Guy Jacob authored
* For CIFAR-10 / ImageNet only * Refactor data_loaders.py, reduce code duplication * Implemented custom sampler * Integrated in image classification sample * Since we now shuffle the test set, had to update expected results in 2 full_flow_tests that do evaluation
-
- Jan 09, 2019
-
-
Neta Zmora authored
These show a couple of the networks on the top1/sparsity curvewq
-
- Jan 08, 2019
-
-
Bar authored
Block pruning: support specifying the block shape from the YAML file Block pruning refers to pruning 4-D structures of a specific shape. This is a why it is sometimes called structure-pruning or group-pruning (confusing, I know). A specific example of block pruning is filter or channel pruning, which have a highly-regular block shape. This commit adds support for pruning blocks/groups/structures that have irregular shapes that accelerate inference on a specific hardware platform. You can read more about the regularity of shapes in (Exploring the Regularity of Sparse Structure in Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf]. When we want to introduce sparsity in order to reduce the compute load of a certain layer, we need to understand how the HW and SW perform the layer's operation, and how this operation is vectorized. Then we can induce sparsity to match the vector shape. For example, Intel AVX-512 are SIMD instructions that apply the same instruction (Single Instruction) on a vector of inputs (Multiple Data). The following single instruction performs an element-wise multiplication of two 16 32-bit element vectors: __m256i result = __mm256_mul_epi32(vec_a, vec_b); If either vec_a or vec_b are partially sparse, we still need to perform the multiplication operation and the sparsity does not help reduce the cost (power, latency) of computation. However, if either vec_a or vec_b contain only zeros then we can eliminate entirely the instruction. In this case, we say that we would like to have group sparsity of 16-elements. I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements. Things are a bit more involved because we also need to understand how the software maps layer operations to hardware. For example, a 3x3 convolution can be computed as a direct-convolution, as a matrix multiply operation, or as a Winograd matrix operation (to name a few ways of computation). These low-level operations are then mapped to SIMD instructions. Finally, the low-level SW needs to support a block-sparse storage-format for weight tensors (see for example: http://www.netlib.org/linalg/html_templates/node90.html)
-
- Dec 19, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
In short: improves top1 results. Might be just due to random conditions.
-
- Dec 11, 2018
-
-
Neta Zmora authored
Filter-pruning using L1-nrom ranking and AGP for the setting the pruning-rate decay. Best Top1: 74.472 (epoch 89) vs. 76.15 baseline No. of Parameters: 12,335,296 (of 25,502,912) = 43.37% dense (51.63% sparse) Total MACs: 1,822,031,872 (of 4,089,184,256) = 44.56% compute = 2.24x
-
- Dec 01, 2018
-
-
Neta Zmora authored
This commit contains the main fix for issue #85. It contains a couple of changes to the YAML structure pruning API, with examples. I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels). New syntax for defining Structured AGP. I tried to make the syntax similar to fine-grained (i.e. element-wise) pruning. All you need to do is add: ```group_type: Filters```. ``` low_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters weights: [module.layer3.0.conv2.weight, module.layer3.0.downsample.0.weight, module.layer3.1.conv2.weight, module.layer3.2.conv2.weight] ``` If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```: ``` low_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters group_dependency: Leader weights: [module.layer3.0.conv2.weight, module.layer3.0.downsample.0.weight, module.layer3.1.conv2.weight, module.layer3.2.conv2.weight] ``` Retired the old ```reg_regims``` API for describing one-shot structured-pruning. The new YAML API is very similar to AGP structured-pruning, which is much better than before. The new API also allows us to describe data-dependencies when doing one-shot structure pruning, just like AGP structured-pruning. This commit also includes further code refactoring. Old API: ``` filter_pruner: class: 'L1RankedStructureParameterPruner' reg_regims: 'module.layer1.0.conv1.weight': [0.6, '3D'] 'module.layer1.1.conv1.weight': [0.6, '3D'] ``` New API: ``` filter_pruner: class: 'L1RankedStructureParameterPruner' group_type: Filters desired_sparsity: 0.6 weights: [ module.layer1.0.conv1.weight, module.layer1.1.conv1.weight] ``` thresholding.py – separate the generation of the binary_map from the pruning_mask so that we can cache the binary map and share it between several modules. pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based” sub-graph pruning dependencies. The concept is explained in issue #85 agp-pruning/resnet20_filters.schedule_agp.yaml agp-pruning/resnet20_filters.schedule_agp_2.yaml agp-pruning/resnet20_filters.schedule_agp_3.yaml network_trimming/resnet56_cifar_activation_apoz.yaml network_trimming/resnet56_cifar_activation_apoz_v2.yaml
-
- Nov 09, 2018
-
-
Neta Zmora authored
Another schedule or ResNet20 Filter-wise pruning for ResNet20, with 64.6% sparsity, 25.4% compute reduction and Top1 91.47% (vs. 91.78 basline).
-
- Nov 08, 2018
-
-
Neta Zmora authored
Change the LR from 0.2 to 0.3, as was actually used to generate the results in remark.
-
- Oct 31, 2018
-
-
Neta Zmora authored
Small improvement in the results
-
- Oct 22, 2018
-
-
Neta Zmora authored
Activation statistics can be leveraged to make pruning and quantization decisions, and so We added support to collect these data. - Two types of activation statistics are supported: summary statistics, and detailed records per activation. Currently we support the following summaries: - Average activation sparsity, per layer - Average L1-norm for each activation channel, per layer - Average sparsity for each activation channel, per layer For the detailed records we collect some statistics per activation and store it in a record. Using this collection method generates more detailed data, but consumes more time, so Beware. * You can collect activation data for the different training phases: training/validation/test. * You can access the data directly from each module that you chose to collect stats for. * You can also create an Excel workbook with the stats. To demonstrate use of activation collection we added a sample schedule which prunes weight filters by the activation APoZ according to: "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures", Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016 https://arxiv.org/abs/1607.03250 We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning, and specifically we separated the AGP schedule from the filter pruning criterion. We added examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner), random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), and filter L1-norm (L1RankedStructureParameterPruner)
-
- Oct 13, 2018
-
-
Neta Zmora authored
Using the automatic gradual pruning for structured-pruning is very simple to use and produces good results. It is implemented in Google's TensorFlow framework.
-
- Oct 01, 2018
-
-
Neta Zmora authored
-
- Sep 20, 2018
-
-
Neta Zmora authored
This schedule demonstrates low-rate pruning (26% sparsity) acting as a regularizer to reduce the generalization error of ResNet50 using the ImageNet dataset. We improve the ResNet50 Top1 test error by 0.4% (23.462 vs 23.85). Top5 is improved as well: 6.82 error vs. 7.13 error in the baseline
-
- Jun 15, 2018
-
-
Neta Zmora authored
-
- Jun 13, 2018
-
-
Neta Zmora authored
-
- Jun 07, 2018
-
-
Neta Zmora authored
Added an implementation of Baidu’s RNN pruning scheme: Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017). Exploring Sparsity in Recurrent Neural Networks. (https://arxiv.org/abs/1704.05119) Added an example of word-level language model compression. The language model is based on PyTorch’s example: https://github.com/pytorch/examples/tree/master/word_language_model Added an AGP pruning schedule and RNN pruning schedule to demonstrate compression of the language model.
-
- May 13, 2018
-
-
Neta Zmora authored
Fix the path to the example schedule for ImageNet baseline training, and to the ImageNet dataset
-
- May 07, 2018
-
-
Neta Zmora authored
-
- Apr 24, 2018
-
-
Neta Zmora authored
-