Commits · 32a7e4bfcf9fcdea76c3d778efb62b664fe6b088 · llvm / distiller

Apr 30, 2020

Knowledge distillation fixes (#503) · 32a7e4bf

Guy Jacob authored 4 years ago

Fixed two long-standing bugs in knowledge distillation:
 * Distillation loss needs to be scaled by T^2 (#122)
 * Use tensor.clone instead of new_tensor when caching student logits (#234)
Updated example results and uploaded the script to generate them

Unverified

32a7e4bf

Apr 20, 2020

Add example code showing schedule specification using code. · 5b01a40c

Neta Zmora authored 4 years ago

This script shows how to specify a compression-schedule directly
using Distiller's API, instead of using a YAML specification

examples/scheduling_api/direct_api_pruning.py

5b01a40c

Feb 17, 2020

Uncomment mistakenly commented line in ResNet18 PTQ LAPQ yaml · 7e0d22d2
Guy Jacob authored 5 years ago

Unverified

7e0d22d2

Post-Train Quant LAPQ Refactoring (#473) · 394e3bc6

Guy Jacob authored 5 years ago

* Move image classification specific setup code to separate script at
  examples/classifier_compression/ptq_lapq.py
* Make ptq_coordinate_search function completely independent of
  command line arguments
* Change LAPQ command line args function to update existing
  pre-existing parser (changed CLAs perfix to 'lapq' for more clarity)
* Enable LAPQ from compress_classifier.py (trigger with --qe-lapq)
* Add pointers in documentation

Unverified

394e3bc6

Feb 06, 2020

Convert Distiller PTQ models to "native" PyTorch PTQ (#458) · cdc1775f

Guy Jacob authored 5 years ago

Convert Distiller PTQ models to "native" PyTorch PTQ (#458)

* New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch()
* Can also be called from PostTrainLinearQuantizer instance:
    quantizer.convert_to_pytorch()
* Can also trigger from command line in image classification sample
* Can save/load converted modules via apputils.load/save_checkpoint
* Added Jupyter notebook tutorial

* Converted modules have only the absolutely necessary quant-dequant
  operations. For a fully quantized model, this means just quantization
  of model input and de-quantization of model output. If a user keeps
  specific internal layers in FP32, quant-dequant operations are added
  as needed
* Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we
  take care of preventing overflows (aka "reduce_range" in the PyTorch
  API)

Unverified

cdc1775f

Feb 03, 2020
- Fix path in LAPQ yaml · bf18da16
  Guy Jacob authored 5 years ago
  
  Unverified
  
  bf18da16
Feb 02, 2020

Loss Aware Post Train Quantization search (#432) · 0b493fd3

Lev Zlotnik authored 5 years ago

"Loss Aware Post-Training Quantization" (Nahshan et al., 2019)

Paper: https://arxiv.org/abs/1911.07190 
Reference implementation:
  https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq

Proper documentation is still TODO, for now see the example YAML file
at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml'

* Implemented in distiller/quantization/ptq_coordinate_search.py
* At the moment that file both the model-independent algorithm
  implementation and image-classification specific sample script.
  Still TODO: Refactor that

* Post train quantization changes (range_linear):
  * Added getters/setters for quantization parameters (scale/zero_point)
    and clipping values
  * Add option to save backup of FP32 weights to allow re-quantization
    after quantizer was created.
  * Add option to clip weights in addition to activations
  * Fix fusions to not occur only when activations aren't quantized
  * RangeLinearFakeQuantWrapper:
    * Make inputs quantization optional
    * In case of ReLU + ACIQ, clip according to input stats

* Data loaders:
  * Add option to not load train set at all from disk (to speed up
    loading time in post-training runs)
  * Modified "image_classifier.py" accordingly

Unverified

0b493fd3

Jan 15, 2020

Fix scale factor calculation in symmetric quantization (#463) · 78255ee0

Guy Jacob authored 5 years ago

(we use 8-bit values below, but this applies to any bit-width)
* We use the notion of "full" and "restricted" quantized range for
  symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
* "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
* Until now, when doing symmetric quantization we assumed a "full"
  range when saturating after quantization, but calculated the scale
  factor as if the range was restricted. This means we weren't making
  full utilization of the quantized range.
* On the other hand, in some other implementations of quantization (e.g.
  TensorFlow), the "restricted" range is used.
* So, we make it an option to use either the proper "full" range
  (q_min = -128) or "restricted" range (q_min = -127).
* LinearQuantMode.SYMMETRIC now means the "full" range is used, and
  added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
  range.
* Updated tests and documentation.

Unverified

78255ee0

Dec 30, 2019
- Separate LinearQuantMode for weights/activations (#451) · 47175961
  Guy Jacob authored 5 years ago
  
  In PostTrainLinearQuantizer and QuantAwareTrainRangeLinearQuantizer
  Unverified
  
  47175961
Dec 29, 2019
- Add Mobilenet v1 baseline training script · 012417a5
  Neta Zmora authored 5 years ago
  
  012417a5
Dec 26, 2019
- Fix broken links in image classification sample readme · b2dc35ba
  Guy Jacob authored 5 years ago
  
  Unverified
  
  b2dc35ba
Dec 09, 2019
- Update examples README · 17df7c44
  Guy Jacob authored 5 years ago
  
  Unverified
  
  17df7c44
- Update Examples Documentation (#441) · b8f34117
  Guy Jacob authored 5 years ago
  
  * Make it easier to find sample apps for different workload types * Add READMEs for sample apps the didn't have any * Update readmes with experiment results where applicable
  Unverified
  
  b8f34117
- Updated README.md in object_detection · 830aa356
  Lev Zlotnik authored 5 years ago
  
  Added tables of results for 85% sparsity
  Unverified
  
  830aa356
Dec 08, 2019

Enable weights/activations-only PTQ for conv/linear modules (#439) · 952028d0

Guy Jacob authored 5 years ago

* Weights-only PTQ:
  * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in
    which case it'll act as a simple pass-through during forward
  * In RangeLinearQuantParamLayerWrapper, if bits_activations is None
    and num_bits_params > 0, Perform quant and de-quant of the
    parameters instead of just quant.
* Activations-only PTQ:
  * Enable activations only quantization for conv/linear modules. When
    PostTrainLinearQuantizer detects # bits != None for activations 
    and # bits == None for weights, a fake-quantization wrapper will
    be used.
* Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command
  line arguments to invoke weights/activations-only quantization,
  respectively.
* Minor refactoring for clarity in PostTrainLinearQuantizer's replace_*
  functions

Unverified

952028d0

Update PTQ ResNet-50 command line results · 326d172f
Guy Jacob authored 5 years ago
```
Results changed following commit 9e7ef987 (#402)
```
Unverified

326d172f

Dec 03, 2019
- Bugfix - Save stats to file when 'qe-calibration' arg is used (#437) · 3df06fb4
  SunYiran authored 5 years ago
  
  3df06fb4
Dec 02, 2019

object detection: remove unsupported summaries · 8e3f04cc
Neta Zmora authored 5 years ago
```
compute-summary and png-summary currently work with image classifiers
only.
```
8e3f04cc

object detection: fix model summary generation · 8002b13f

Neta Zmora authored 5 years ago

When multi-processing, we want only one process to generate the
summary, while the other processes do nothing (lazy bums!)

8002b13f

Object Detection Compression (#343) · 697b3cfe

Lev Zlotnik authored 5 years ago

Add an example of compressing OD pytorch models.

In this example we compress torchvision's object detection models - FasterRCNN / MaskRCNN / KeypointRCNN.
We've modified the reference code for object detection to allow easy compression scheduling with YAML configuration.

697b3cfe

Nov 28, 2019

AMC: fix problems reported in issue #429 · 47af2cfa

Neta Zmora authored 5 years ago

- define ALMOST_ONE
- define op_type
- remove sanity assert (need to understand what tolerance value to use
in the assert)

Co-authored-by: csc12138
Co-authored-by: wangyidong3

47af2cfa

AMC: fix problems reported in issue #429 · 48244f3b

Neta Zmora authored 5 years ago

- define ALMOST_ONE
- define op_type
- remove sanity assert (need to understand what tolerance value to use
in the assert)

48244f3b

Nov 17, 2019
- Add README.md files for some pruning examples · 0c175c94
  Neta Zmora authored 5 years ago
  
  0c175c94
Nov 16, 2019
- Add README.md files for APG and DropFilter · 70e26735
  Neta Zmora authored 5 years ago
  
  70e26735
- Remove duplicate YAML file · 49933144
  Neta Zmora authored 5 years ago
  
  49933144
Nov 13, 2019

image_classifier.py: PTQ stats collection and eval in same run (#346) · fb98377e

Bar authored 5 years ago

* Previous implementation:
  * Stats collection required a separate run with `-qe-calibration`.
  * Specifying `--quantize-eval` without `--qe-stats-file` triggered
    dynamic quantization.
  * Running with `--quantize-eval --qe-calibration <num>` only ran
    stats collection and ignored --quantize-eval.

* New implementation:
  * Running `--quantize-eval --qe-calibration <num>` will now 
    perform stats collection according to the calibration flag,
    and then quantize the model with the collected stats (and
    run evaluation).
  * Specifying `--quantize-eval` without `--qe-stats-file` will
    trigger the same flow as in the bullet above, as if 
    `--qe-calibration 0.05` was used (i.e. 5% of the test set will
    be used for stats).
  * Added new flag: `--qe-dynamic`. From now, to do dynamic 
    quantization, need to explicitly run:
    `--quantize-eval --qe-dynamic`
  * As before, can still run `--qe-calibration` without 
    `--quantize-eval` to perform "stand-alone" stats collection
  * The following flags, which all represent different ways to
    control creation of stats or use of existing stats, are now
    mutually exclusive:
    `--qe-calibration`, `-qe-stats-file`, `--qe-dynamic`,
    `--qe-config-file`

fb98377e

Nov 11, 2019

Pruning with virtual Batch-norm statistics folding (#415) · c849a25f

Neta Zmora authored 5 years ago

* pruning: add an option to virtually fold BN into Conv2D for ranking

PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
e.g.:
policies:
  - pruner:
      instance_name : low_pruner
      args:
        fold_batchnorm: True
    starting_epoch: 0
    ending_epoch: 30
    frequency: 2

* AGP: non-functional refactoring

distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
name as we don’t really prune in this function

* Simplify GEMM weights input-channel ranking logic

Ranking weight-matrices by input channels is similar to ranking 4D
Conv weights by input channels, so there is no need for duplicate logic.

distiller/pruning/ranked_structures_pruner.py
-change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
which is a more appropriate function name as we don’t really prune in this
function
-remove the code handling ranking of matrix rows

distiller/norms.py – remove rank_cols.

distiller/thresholding.py – in expand_binary_map treat `channels` group_type
the same as the `cols` group_type when dealing with 2D weights

* AGP: add example of ranking filters with virtual BN-folding

Also update resnet20 AGP examples

Unverified

c849a25f

Nov 07, 2019

Fix Early-exit code · fc62caab

Neta Zmora authored 5 years ago

Fix the EE code so that it works with the current 'master' branch,
and add a test for high-level EE regression

fc62caab

Nov 06, 2019
- Bugfix: Deepcopy args when creating ClassifierCompressor (#421) · 78144d4c
  Guy Jacob authored 5 years ago
  
  Co-authored-by: Bar <29775567+barrh@users.noreply.github.com> Co-authored-by: Guy Jacob <guy.jacob@intel.com>
  Unverified
  
  78144d4c
Oct 31, 2019
- Re-enable NLP quant notebooks after following #402 (#412) · fdb12eb1
  Guy Jacob authored 5 years ago
  
  * Add blacklist to quantizer. In PTQ put Dropout on the blacklist. * Update notebooks to use 2-phase stats collection * Other small fixes
  Unverified
  
  fdb12eb1
- Fix typo in yaml · cc04fa9f
  Guy Jacob authored 5 years ago
  
  cc04fa9f
- Fix typo in yaml · 93cd6d7d
  Guy Jacob authored 5 years ago
  
  93cd6d7d
Oct 23, 2019

inspect_ckpt.py: support for very large models · 31c1bd89

Neta Zmora authored 5 years ago

Force loading on the CPU which always has more memory than a
single GPU.  This is useful for models that cannot be loaded onto
a single GPU.

31c1bd89

Update MobileNet v1 baseline training configuration file · 2c2a9417
Neta Zmora authored 5 years ago

2c2a9417

Fix AMC notebooks' sample commnand-line examples · 5059419b

Neta Zmora authored 5 years ago

As documented in issue #395, some of the command-line examples in the
AMC notebooks are incorrect.
Also, fix some bugs that were introduced with the refactoring of the
low-level pruning API

5059419b

Oct 07, 2019

Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987

Guy Jacob authored 5 years ago


* Greedy search script for post-training quantization settings
  * Iterates over each layer in the model in order. For each layer,
    checks a user-defined set of quantization settings and chooses
    the best one based on validation accuracy
  * Provided sample that searches for best activations-clipping
    mode per layer, on image classification models

* Proper handling of mixed-quantization settings in post-train quant:
  * By default, the quantization settings for each layer apply only
    to output quantization
  * Propagate quantization settings for activations tensors through
    the model during execution
  * For non-quantized inputs to layers that require quantized inputs,
    fall-back to quantizing according to the settings used for the
    output
  * In addition, provide mechanism to override inputs quantization
    settings via the YAML configuration file
  * By default all modules are quantized now. For module types that
    don't have a dedicated quantized implementation, "fake"
    quantization is performed

* Misc. Changes
  * Fuse ReLU/ReLU6 to predecessor during post-training quantization
  * Fixes to ACIQ clipping in the half-range case

Co-authored-by: Lev Zlotnik <lev.zlotnik@intel.com>
Co-authored-by: Guy Jacob <guy.jacob@intel.com>

Unverified

9e7ef987

AMC: fix the replay_buffer_size when using Coach and DDPG · 738d57f4
Neta Zmora authored 5 years ago

738d57f4

Oct 06, 2019

Low-level pruning API refactor (#401) · 05d5592e

Neta Zmora authored 5 years ago

Some refactoring of the low-level pruning API

Added distiller/norms.py - for calculating norms of various sub-tensors.

ranked_structures_pruner.py:
-Removed l1_magnitude, l2_magnitude. Use instead distiller.norms.l1_norm
-Lots of refactoring
-replaced LpRankedStructureParameterPruner.ch_binary_map_to_mask with
distiller.thresholding.expand_binary_map
-FMReconstructionChannelPruner.rank_and_prune_channels used L2-norm
by default and now uses L1-norm (i.e.magnitude_fn=l2_magnitude was
replaced with magnitude_fn=distiller.norms.l1_norm)

thresholding.py:
-Delegated lots of the work to the new norms.py.
-Removed support for 4D (entire convolution layers) since that has not been
maintained for a longtime. This may break some old scripts that remove entire
layers.
-added expand_binary_map() explicitly so others can use it. Might need to
move to a different file
-removed threshold_policy()

utils.py:
-use distiller.norms.xxx for sparsity stats

Unverified

05d5592e

Sep 27, 2019
- agp-pruning/resnet50.schedule_agp.filters.yaml - fix cmd-line · 2dcf3ff3
  Neta Zmora authored 5 years ago
  
  2dcf3ff3
- update examples/baseline_networks/README.md · 0cfcf0d7
  Neta Zmora authored 5 years ago
  
  Unverified
  
  0cfcf0d7