Commits · 410a059bb1a9a2db40b3a89aaa45f08a9acf0120 · llvm / distiller

Nov 16, 2019
- Add README.md files for APG and DropFilter · 70e26735
  Neta Zmora authored 5 years ago
  
  70e26735
- Remove duplicate YAML file · 49933144
  Neta Zmora authored 5 years ago
  
  49933144
Nov 11, 2019

Pruning with virtual Batch-norm statistics folding (#415) · c849a25f

Neta Zmora authored 5 years ago

* pruning: add an option to virtually fold BN into Conv2D for ranking

PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
e.g.:
policies:
  - pruner:
      instance_name : low_pruner
      args:
        fold_batchnorm: True
    starting_epoch: 0
    ending_epoch: 30
    frequency: 2

* AGP: non-functional refactoring

distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
name as we don’t really prune in this function

* Simplify GEMM weights input-channel ranking logic

Ranking weight-matrices by input channels is similar to ranking 4D
Conv weights by input channels, so there is no need for duplicate logic.

distiller/pruning/ranked_structures_pruner.py
-change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
which is a more appropriate function name as we don’t really prune in this
function
-remove the code handling ranking of matrix rows

distiller/norms.py – remove rank_cols.

distiller/thresholding.py – in expand_binary_map treat `channels` group_type
the same as the `cols` group_type when dealing with 2D weights

* AGP: add example of ranking filters with virtual BN-folding

Also update resnet20 AGP examples

Unverified

c849a25f

Oct 23, 2019
- Update MobileNet v1 baseline training configuration file · 2c2a9417
  Neta Zmora authored 5 years ago
  
  2c2a9417
Sep 27, 2019
- agp-pruning/resnet50.schedule_agp.filters.yaml - fix cmd-line · 2dcf3ff3
  Neta Zmora authored 5 years ago
  
  2dcf3ff3
Apr 01, 2019

Add ResNeXt101 pruning example (#201) · 81f5c0c7
Bar authored 6 years ago

81f5c0c7

Load optimizer from checkpoint (BREAKING - see details) (#182) · 992291cf

Bar authored 6 years ago

Load optimizer from checkpoint (BREAKING - see details) (#182)

* Fixes issues #70, #145 and replaces PR #74
* checkpoint.py
  * save_checkpoint will now save the optimizer type in addition to
    its state
  * load_checkpoint will now instantiate an optimizer based on the
    saved type and load its state
* config.py: file/dict_config now accept the resumed epoch to pass to
  LR schedulers
* policy.py: LRPolicy now passes the current epoch to the LR scheduler
* Classifier compression sample
  * New flag '--resume-from' for properly resuming a saved training
    session, inc. optimizer state and epoch #
  * Flag '--reset-optimizer' added to allow discarding of a loaded
    optimizer.
  * BREAKING:
    * Previous flag '--resume' is deprecated and is mapped to
      '--resume-from' + '--reset-optimizer'. 
    * But, old resuming behavior had an inconsistency where the epoch
      count would continue from the saved epoch, but the LR scheduler
      was setup as if we were starting from epoch 0.
    * Using '--resume-from' + '--reset-optimizer' now will simply
      RESET the epoch count to 0 for the whole environment.
    * This means that scheduling configurations (in YAML or code)
      which assumed use of '--resume' might need to be changed to
      reflect the fact that the epoch count now starts from 0
    * All relevant YAML files under 'examples' modified to reflect
      this change
* Initial support for ReduceLROnPlateau (#161):
  * Allow passing **kwargs to policies via the scheduler
  * Image classification now passes the validation loss to the
    scheduler, to be used yo ReduceLROnPlateau
  * The current implementation is experimental and subject to change

992291cf

Mar 23, 2019

ResNet50 AGP: 84.6% element-wise sparsity (75.7% Top1) · b8fc5085

Neta Zmora authored 6 years ago

This schedule demonstrates high-rate element-wise pruning 
(84.6% sparsity) of Resnet 50.
Top1 is 75.66 vs the published Top1: 76.15 i.e. a drop of -0.5%.

Unverified

b8fc5085

Resnet50 AGP: 80% element-wise sparsity with 76.0% Top1 · 6401c3af

Neta Zmora authored 6 years ago

This is an improved ResNet50 AGP schedule which generates a ResNet50 network that is 80% element-wise sparse, with statistically insignificant drop in Top1 accuracy (-0.13%).

Unverified

6401c3af

Feb 10, 2019

Load different random subset of dataset on each epoch (#149) · 4b1d0c89

Guy Jacob authored 6 years ago

* For CIFAR-10 / ImageNet only
* Refactor data_loaders.py, reduce code duplication
* Implemented custom sampler
* Integrated in image classification sample
* Since we now shuffle the test set, had to update expected results
  in 2 full_flow_tests that do evaluation

Unverified

4b1d0c89

Jan 09, 2019
- ResNet50 filter pruning - add some older schedules · d765ffc9
  Neta Zmora authored 6 years ago
  
  These show a couple of the networks on the top1/sparsity curvewq
  d765ffc9
Jan 08, 2019

Non-channel/filter block pruning (#119) · b9d53ff8

Bar authored 6 years ago

Block pruning: support specifying the block shape from the YAML file

Block pruning refers to pruning 4-D structures of a specific shape.  This 
is a why it is sometimes called structure-pruning or group-pruning 
(confusing, I know).
A specific example of block pruning is filter or channel pruning, which
have a highly-regular block shape.   
This commit adds support for pruning blocks/groups/structures
that have irregular shapes that accelerate inference on a specific 
hardware platform.  You can read more about the regularity of shapes in
(Exploring the Regularity of Sparse Structure in
Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf].

When we want to introduce sparsity in order to reduce the compute load
of a certain layer, we need to understand how the HW and SW perform
the layer's operation, and how this operation is vectorized.  Then we can
induce sparsity to match the vector shape.

For example, Intel AVX-512 are SIMD instructions that apply the same
instruction (Single Instruction) on a vector of inputs (Multiple
Data).  The following single instruction performs an element-wise
multiplication of two 16 32-bit element vectors:

     __m256i result = __mm256_mul_epi32(vec_a, vec_b);

If either vec_a or vec_b are partially sparse, we still need to perform
the multiplication operation and the sparsity does not help reduce the
cost (power, latency) of computation.  However, if either vec_a or vec_b
contain only zeros then we can eliminate entirely the instruction.  In this 
case, we say that we would like to have group sparsity of 16-elements.  
I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements.

Things are a bit more involved because we also need to understand how the
software maps layer operations to hardware.  For example, a 3x3
convolution can be computed as a direct-convolution, as a matrix multiply
operation, or as a Winograd matrix operation (to name a few ways of
computation).  These low-level operations are then mapped to SIMD
instructions.

Finally, the low-level SW needs to support a block-sparse storage-format
for weight tensors (see for example:
http://www.netlib.org/linalg/html_templates/node90.html)

b9d53ff8

Dec 19, 2018
- Resnet50 AGP: fix numerical error in text describing the results. · 2a4cde0f
  Neta Zmora authored 6 years ago
  
  2a4cde0f
- AGP for filters: added FC element-wise pruning to filter-pruning · d89a3a07
  Neta Zmora authored 6 years ago
  
  In short: improves top1 results. Might be just due to random conditions.
  d89a3a07
Dec 11, 2018

ResNet50 filter-pruning · 2e115f9d

Neta Zmora authored 6 years ago

Filter-pruning using L1-nrom ranking and AGP for the setting the pruning-rate decay.

Best Top1: 74.472 (epoch 89) vs. 76.15 baseline
No. of Parameters: 12,335,296 (of 25,502,912) = 43.37% dense (51.63% sparse)
Total MACs: 1,822,031,872 (of 4,089,184,256) = 44.56% compute = 2.24x

2e115f9d

Dec 01, 2018

Important changes to pruning channels and filters (#93) · a0bf2a8f

Neta Zmora authored 6 years ago

This commit contains the main fix for issue #85.  It contains a couple of changes to the YAML structure pruning API, with examples.
I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels).

New syntax for defining Structured AGP.  I tried to make the syntax similar to fine-grained
(i.e. element-wise) pruning.  All you need to do is add: ```group_type: Filters```.
```
  low_pruner:
    class: L1RankedStructureParameterPruner_AGP
    initial_sparsity : 0.10
    final_sparsity: 0.50
    group_type: Filters
    weights: [module.layer3.0.conv2.weight,
              module.layer3.0.downsample.0.weight,
              module.layer3.1.conv2.weight,
              module.layer3.2.conv2.weight]
```

If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```:
```
  low_pruner:
    class: L1RankedStructureParameterPruner_AGP
    initial_sparsity : 0.10
    final_sparsity: 0.50
    group_type: Filters
    group_dependency: Leader
    weights: [module.layer3.0.conv2.weight,
              module.layer3.0.downsample.0.weight,
              module.layer3.1.conv2.weight,
              module.layer3.2.conv2.weight]
```

Retired the old ```reg_regims``` API for describing one-shot structured-pruning.

The new YAML API is very similar to AGP structured-pruning, which is much better
than before.
The new API also allows us to describe data-dependencies when doing one-shot
structure pruning, just like AGP structured-pruning.

This commit also includes further code refactoring.

Old API:
```
  filter_pruner:
     class: 'L1RankedStructureParameterPruner'
     reg_regims:
       'module.layer1.0.conv1.weight': [0.6, '3D']
       'module.layer1.1.conv1.weight': [0.6, '3D']
```

New API:
```
 filter_pruner:
    class: 'L1RankedStructureParameterPruner'
    group_type: Filters
    desired_sparsity: 0.6
    weights: [
      module.layer1.0.conv1.weight,
      module.layer1.1.conv1.weight]
```

thresholding.py – separate the generation of the binary_map from the pruning_mask so that we
can cache the binary map and share it between several modules.

pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based”
sub-graph pruning dependencies.  The concept is explained in issue #85


agp-pruning/resnet20_filters.schedule_agp.yaml
agp-pruning/resnet20_filters.schedule_agp_2.yaml
agp-pruning/resnet20_filters.schedule_agp_3.yaml
network_trimming/resnet56_cifar_activation_apoz.yaml
network_trimming/resnet56_cifar_activation_apoz_v2.yaml

Unverified

a0bf2a8f

Nov 09, 2018

AGP filter-pruning: a new ResNet20 schedule · ff6985ad

Neta Zmora authored 6 years ago

Another schedule or ResNet20 Filter-wise pruning for ResNet20, with
64.6% sparsity, 25.4% compute reduction and Top1 91.47% (vs. 91.78
basline).

ff6985ad

Nov 08, 2018
- ResNet20 AGP-Filters: fix LR in the sample command line · 337e0270
  Neta Zmora authored 6 years ago
  
  Change the LR from 0.2 to 0.3, as was actually used to generate the results in remark.
  337e0270
Oct 31, 2018
- ResNet20-Cifar: imporved the results of L1 filter pruning · 78cdad07
  Neta Zmora authored 6 years ago
  
  Small improvement in the results
  78cdad07
Oct 22, 2018

Activation statistics collection (#61) · 54a5867e

Neta Zmora authored 6 years ago

Activation statistics can be leveraged to make pruning and quantization decisions, and so
We added support to collect these data.
- Two types of activation statistics are supported: summary statistics, and detailed records 
per activation.
Currently we support the following summaries: 
- Average activation sparsity, per layer
- Average L1-norm for each activation channel, per layer
- Average sparsity for each activation channel, per layer

For the detailed records we collect some statistics per activation and store it in a record.  
Using this collection method generates more detailed data, but consumes more time, so
Beware.

* You can collect activation data for the different training phases: training/validation/test.
* You can access the data directly from each module that you chose to collect stats for.  
* You can also create an Excel workbook with the stats.

To demonstrate use of activation collection we added a sample schedule which prunes 
weight filters by the activation APoZ according to:
"Network Trimming: A Data-Driven Neuron Pruning Approach towards 
Efficient Deep Architectures",
Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
https://arxiv.org/abs/1607.03250

We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
and specifically we separated the AGP schedule from the filter pruning criterion.  We added
examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
and filter L1-norm (L1RankedStructureParameterPruner)

Unverified

54a5867e

Oct 13, 2018

AGP for structs (#58) · e0bfc796

Neta Zmora authored 6 years ago

Using the automatic gradual pruning for structured-pruning is very
simple to use and produces good results.
It is implemented in Google's TensorFlow framework.

Unverified

e0bfc796

Oct 01, 2018
- ResNet50 pruning: added a schedule to prune ResNet50 to 70% · 8a869bec
  Neta Zmora authored 6 years ago
  
  8a869bec
Sep 20, 2018

ResNet50 regularization using pruning · 57e6bd00

Neta Zmora authored 6 years ago

This schedule demonstrates low-rate pruning (26% sparsity) acting as a
regularizer to reduce the generalization error of ResNet50 using the
ImageNet dataset.

We improve the ResNet50 Top1 test error by 0.4% (23.462 vs 23.85).
Top5 is improved as well: 6.82 error vs. 7.13 error in the baseline

57e6bd00

Jun 15, 2018
- language model: add a bunch of schedule scripts · 1aaef790
  Neta Zmora authored 6 years ago
  
  1aaef790
Jun 13, 2018
- Language model: add a schedule for a Large model with 73% sparsity · 37aa68bf
  Neta Zmora authored 6 years ago
  
  37aa68bf
Jun 07, 2018

Word-level language model compression · 52658b87

Neta Zmora authored 6 years ago

Added an implementation of Baidu’s RNN pruning scheme:
Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017).
    Exploring Sparsity in Recurrent Neural Networks.
    (https://arxiv.org/abs/1704.05119)

Added an example of word-level language model compression.
The language model is based on PyTorch’s example:
https://github.com/pytorch/examples/tree/master/word_language_model

Added an AGP pruning schedule and RNN pruning schedule to demonstrate
compression of the language model.

52658b87

May 13, 2018

imagenet baseline schedule: fix paths · 5fdb0573

Neta Zmora authored 6 years ago

Fix the path to the example schedule for ImageNet baseline training,
and to the ImageNet dataset

5fdb0573

May 07, 2018
- Fix Alexnet AGP sample command-line string · 04195be9
  Neta Zmora authored 6 years ago
  
  04195be9
Apr 24, 2018
- first commit · 6eef69b5
  Neta Zmora authored 7 years ago
  
  6eef69b5