Commits · b8b4cf32fa13acd47e76f1c8186e2b879971138c · llvm / distiller

Nov 14, 2019

Guy Jacob authored 5 years ago

* summary_graph.py:
  * Change ONNX op.uniqueName() to op.debugName()
  * Removed scope-naming workaround which isn't needed in PyTorch 1.3
* Tests:
  * Naming of trace entries changed in 1.3. Fixed SummaryGraph unit
    test that checked that
  * Adjusted expected values in full_flow_tests
  * Adjusted tolerance in test_sim_bn_fold
  * Filter some new warnings

Unverified

b8b4cf32

Nov 13, 2019

image_classifier.py: PTQ stats collection and eval in same run (#346) · fb98377e

Bar authored 5 years ago

* Previous implementation:
  * Stats collection required a separate run with `-qe-calibration`.
  * Specifying `--quantize-eval` without `--qe-stats-file` triggered
    dynamic quantization.
  * Running with `--quantize-eval --qe-calibration <num>` only ran
    stats collection and ignored --quantize-eval.

* New implementation:
  * Running `--quantize-eval --qe-calibration <num>` will now 
    perform stats collection according to the calibration flag,
    and then quantize the model with the collected stats (and
    run evaluation).
  * Specifying `--quantize-eval` without `--qe-stats-file` will
    trigger the same flow as in the bullet above, as if 
    `--qe-calibration 0.05` was used (i.e. 5% of the test set will
    be used for stats).
  * Added new flag: `--qe-dynamic`. From now, to do dynamic 
    quantization, need to explicitly run:
    `--quantize-eval --qe-dynamic`
  * As before, can still run `--qe-calibration` without 
    `--quantize-eval` to perform "stand-alone" stats collection
  * The following flags, which all represent different ways to
    control creation of stats or use of existing stats, are now
    mutually exclusive:
    `--qe-calibration`, `-qe-stats-file`, `--qe-dynamic`,
    `--qe-config-file`

fb98377e

Nov 11, 2019

Pruning with virtual Batch-norm statistics folding (#415) · c849a25f

Neta Zmora authored 5 years ago

* pruning: add an option to virtually fold BN into Conv2D for ranking

PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
e.g.:
policies:
  - pruner:
      instance_name : low_pruner
      args:
        fold_batchnorm: True
    starting_epoch: 0
    ending_epoch: 30
    frequency: 2

* AGP: non-functional refactoring

distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
name as we don’t really prune in this function

* Simplify GEMM weights input-channel ranking logic

Ranking weight-matrices by input channels is similar to ranking 4D
Conv weights by input channels, so there is no need for duplicate logic.

distiller/pruning/ranked_structures_pruner.py
-change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
which is a more appropriate function name as we don’t really prune in this
function
-remove the code handling ranking of matrix rows

distiller/norms.py – remove rank_cols.

distiller/thresholding.py – in expand_binary_map treat `channels` group_type
the same as the `cols` group_type when dealing with 2D weights

* AGP: add example of ranking filters with virtual BN-folding

Also update resnet20 AGP examples

Unverified

c849a25f

Nov 10, 2019

early-exit: further refactoring and resnet50-imagenet · 795590c8

Neta Zmora authored 5 years ago

Refactor EE code and place in a separate file.
Fix resnet50-earlyexit (inputs of nn.Linear layers was wrong).

Caveats:
1. resnet50-earlyexit performance needs to be tested for performance.
2. there is still too much EE code dispersed in apputils/image_classifier.py
and compress_classifier.py

795590c8

tests/full_flow_tests.py: improve early-exit test robustness · 49a2a967

Neta Zmora authored 5 years ago

EE runs emit more statistics than the regular classification pipeline,
and it is more robust to validate more of the log output for correctness
validation.

49a2a967

Nov 07, 2019

Fix Early-exit code · fc62caab

Neta Zmora authored 5 years ago

Fix the EE code so that it works with the current 'master' branch,
and add a test for high-level EE regression

fc62caab

Nov 06, 2019
- PTQ: Exposed get/set ops for post-train quantization params (#418) · 4478a73d
  Lev Zlotnik authored 5 years ago
  
  4478a73d
Oct 07, 2019

Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987

Guy Jacob authored 5 years ago


* Greedy search script for post-training quantization settings
  * Iterates over each layer in the model in order. For each layer,
    checks a user-defined set of quantization settings and chooses
    the best one based on validation accuracy
  * Provided sample that searches for best activations-clipping
    mode per layer, on image classification models

* Proper handling of mixed-quantization settings in post-train quant:
  * By default, the quantization settings for each layer apply only
    to output quantization
  * Propagate quantization settings for activations tensors through
    the model during execution
  * For non-quantized inputs to layers that require quantized inputs,
    fall-back to quantizing according to the settings used for the
    output
  * In addition, provide mechanism to override inputs quantization
    settings via the YAML configuration file
  * By default all modules are quantized now. For module types that
    don't have a dedicated quantized implementation, "fake"
    quantization is performed

* Misc. Changes
  * Fuse ReLU/ReLU6 to predecessor during post-training quantization
  * Fixes to ACIQ clipping in the half-range case

Co-authored-by: Lev Zlotnik <lev.zlotnik@intel.com>
Co-authored-by: Guy Jacob <guy.jacob@intel.com>

Unverified

9e7ef987

Oct 06, 2019

Low-level pruning API refactor (#401) · 05d5592e

Neta Zmora authored 5 years ago

Some refactoring of the low-level pruning API

Added distiller/norms.py - for calculating norms of various sub-tensors.

ranked_structures_pruner.py:
-Removed l1_magnitude, l2_magnitude. Use instead distiller.norms.l1_norm
-Lots of refactoring
-replaced LpRankedStructureParameterPruner.ch_binary_map_to_mask with
distiller.thresholding.expand_binary_map
-FMReconstructionChannelPruner.rank_and_prune_channels used L2-norm
by default and now uses L1-norm (i.e.magnitude_fn=l2_magnitude was
replaced with magnitude_fn=distiller.norms.l1_norm)

thresholding.py:
-Delegated lots of the work to the new norms.py.
-Removed support for 4D (entire convolution layers) since that has not been
maintained for a longtime. This may break some old scripts that remove entire
layers.
-added expand_binary_map() explicitly so others can use it. Might need to
move to a different file
-removed threshold_policy()

utils.py:
-use distiller.norms.xxx for sparsity stats

Unverified

05d5592e

Sep 10, 2019

ACIQ clipping in post-training quantization (#173) · 534072d8

Yury Nahshan authored 5 years ago

ACIQ clipping method, as described in:

Post training 4-bit quantization of convolutional networks for rapid-deployment
(Ron Banner , Yury Nahshan, Daniel Soudry)
(NeurIPS 2019)

https://arxiv.org/abs/1810.05723



Co-authored-by: Yury Nahshan <yury.nahshan@intel.com>
Co-authored-by: Lev Zlotnik <lev.zlotnik@intel.com>

534072d8

Sep 01, 2019

AMC: add pruning of FC layers · 3f7a9408

Neta Zmora authored 5 years ago

FMReconstructionChannelPruner: add support for nn.Linear layers
utils.py: add non_zero_channels()
thinning: support removing channels from FC layers preceding Conv layers
test_pruning.py: add test_row_pruning()
scheduler: init from a dictionary of Maskers
coach_if.py – fix imports of Clipped-PPO and TD3

3f7a9408

Aug 22, 2019

apputils/checkpoint.py: load_checkpoint can be called w/o specifying the model · b41c4d2d

Neta Zmora authored 5 years ago

This is inspired by @barrh’s PR https://github.com/NervanaSystems/distiller/pull/246
but it at a “slower-integration-pace” and w/o changing APIs.

1. create_model() attaches model attributes (arch, dataset, is_parallel) to created models.
2. save_checkpoint() stores the new model attributes with checkpoint metadata
3. load_checkpoint() can be invoked with model=None, in which case we attempt
to create the model from the stored checkpoint metadata.

b41c4d2d

Aug 21, 2019
- test_pruning.py: adjust test after relaxing thinning checkpoint loading · 961bfc89
  Neta Zmora authored 5 years ago
  
  961bfc89
Aug 20, 2019

full_flow_tests.py: Added a pruning test · 67db927d

Neta Zmora authored 5 years ago

This test uses MNIST for faster execution and test various
pruners and their scheduling.

67db927d

Aug 07, 2019
- full_flow_tests.py: relax verification of sensitivity.png · bbdb19de
  Neta Zmora authored 5 years ago
  
  bbdb19de
- [Quantizer] Fix handling when default bits_activations == None (#345) · ce3528e4
  Guy Jacob authored 5 years ago
  
  Unverified
  
  ce3528e4
Aug 06, 2019

AMC and other refactoring - large merge (#339) · 02054da1

Neta Zmora authored 5 years ago

*An implementation of AMC (the previous implementation
 code has moved to a new location under 
/distiller/examples/auto_compression/amc.  AMC is aligned
with the ‘master’ branch of Coach.
*compress_classifier.py is refactored.  The base code moved
to /distiller/apputils/image_classifier.py.  Further refactoring
will follow.
We want to provide a simple and small API to the basic features of
a classifier-compression application.
This will help applications that want to use the make features of a
classifier-compression application, without the standard training
regiment.
AMC is one example of a stand-alone application that needs to leverage
the capabilities of a classifier-compression application, but is currently
coupled to `compress_classifier.py`.
`multi-finetune.py` is another example.
* ranked_structures_pruner.py:
** Added support for grouping channels/filters
Sometimes we want to prune a group of structures: e.g. groups of
8-channels.  This feature does not force the groups to be adjacent,
so it is more like a set of structures.  E.g. in the case of pruning
channels from a 64-channels convolution, grouped by 8 channels, we 
will prune exactly one of 0/8/16/24/32/40/48/56 channels.  I.e. 
always a multiple of 8-channels, excluding the set of all 64 channels.
** Added FMReconstructionChannelPruner – this is channel
pruning using L1-magnitude to rank and select channels to
remove, and feature-map reconstruction to improve the
resilience to the pruning.
* Added a script to run multiple instances of an 
experiment, in different processes:
 examples/classifier_compression/multi-run.py
* Set the seed value even when not specified by the command-line
arguments, so that we can try and recreate the session.
* Added pruning ranking noise -
Ranking noise introduces Gaussian noise when ranking channels/filters
using Lp-norm.  The noise is introduced using the epsilon-greedy
methodology, where ranking using exact Lp-norm is considered greedy.
* Added configurable rounding of pruning level: choose whether to 
Round up/down when rounding the number of structures to prune 
(rounding is always to an integer).

Unverified

02054da1

Jul 29, 2019

DistillerModuleList conversion: Handle models w. duplicate modules (#338) · db531db8

Guy Jacob authored 5 years ago

* By duplicate modules we mean:
  self.relu1 = nn.Relu()
  self.relu2 = self.relu1
* The issue:
  The second module ('relu2') will not be returned by
  torch.nn.Module.named_modules/children()
* When converting to DistillerModuleList, in order to maintain the
  original order of modules and in order to have a correct mapping
  of names before/after the conversion - we need to take the duplicates
  into account
* Implemented an internal version of named_modules/children that includes
  duplicates
* Added test case for this + refactored the module list conversion tests

Unverified

db531db8

Jul 22, 2019

Fix non 1:1 mapping between model w. ModuleList and SummaryGraph (#328) · b614330c

Guy Jacob authored 5 years ago

The PyTorch trace mechanism doesn't "see" torch.nn.ModuleList modules
(since they don't have a forward function). As a result, the mapping
from module names at the Python model definition level to the
scope-names at the trace level is not 1:1. This makes it impossible for
us to map back from SummaryGraph ops to their respective nn.Modules,
which is required for flows like BatchNorm folding and stats fusion in
post-training quantization.

In #313 we handled this issue specifically in DistillerLSTM, but it
makes much more sense to have a generic and automatic solution for this
issue, which doesn't require the user to modify the model. This is such
a solution.
    
* Implemented DistillerModuleList, a replacement for nn.ModuleList
  which results in full and unique scope-names
* See documentation for this class in summary_graph.py for extensive
  details on the issue and solution
* When generating a SummaryGraph, the model is scanned and all instances
  of torch.nn.ModuleList are replaced with DistillerModulelist
* Add tests for new functionality
* Partially revert changes made to DistillerLSTM in commit 43548deb:
  Keep the refactored _create_cells_list function, but have it create
  a standard torch.nn.ModuleList (since we're the ModuleList issue
  automatically now, and no need to confuse users with ad-hoc list 
  implementations

Unverified

b614330c

Jul 21, 2019
- Fix bug in test that resulted in duplicate modules in a model · 4ec96d90
  Guy Jacob authored 5 years ago
  
  4ec96d90
Jul 10, 2019

Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb

Guy Jacob authored 5 years ago

* "Net-aware quantization" - using the term coined in
  https://arxiv.org/abs/1811.09886. (section 3.2.2).
  Refers to considering sequences of modules when quantizing. This 
  isn't exactly layer fusion - we modify activation stats prior to
  setting quantization parameters, to make sure that when a module
  is followed by certain activation functions, only the relevant
  ranges are quantized. We do this for:
    * ReLU - Clip all negative values
    * Tanh / Sigmoid - Clip according to the (approximated) saturation
      values for these functions. We use [-4, 4] for tanh and [-6, 6]
      for sigmoid.

* Perform batch-norm folding before post-training quantization.
  Batch-norm parameters are folded into the parameters of the previous
  layer and the BN layer is replaced with an identity module.

* Both BN folding and "net-aware" are now automatically executed
  in PostTrainLinearQuantizer (details of this change below)

* BN folding enabled by new generic mechanism to "fuse" module
  sequences (at the Python API level)
    * First module in sequence is replaced/modified by a user-provided
      function, rest of moudles replaced with nn.Identity

* Quantizer changes:
  * Optionally create adjacency map during prepare_model
  * Subclasses may enforce adjacency map creation
  * Refatcoring: Replace _prepare_model_impl with pre and post
    override-able "callbacks", so core functionality is always executed

* PostTrainLinearQuantizer Changes:
  * Enforce creation of adjacency map. This means users must now pass a
    dummy input to PostTrainLinearQuantizer.prepare_model
  * Before module replacement - Apply BN folding and stats updates according
    to net-aware quantization

* Updated the language model quantization tutorial to reflect the new
  functionality

* Updated the image classification post-train quantization samples
  (command line and YAML)

* Other changes:
  * Distller LSTM implementation:
    Replace the ModuleList for cells with a plain list. The PyTorch trace
    mechanism doesn't "see" ModuleList objects, it only sees the 
    contained modules. This means that the "scopeName" of these modules
    isn't complete, which makes it impossible to match op names in 
    SummaryGraph to modules in the Python model.
  * ActivationStatsCollector: Ignore nn.Identity modules

Unverified

43548deb

Jul 04, 2019

Switch to PyTorch 1.1.0 (#306) · 032b1f74

Guy Jacob authored 5 years ago

* PyTorch 1.1.0 now required
  - Moved other dependencies to up-to-date versions as well
* Adapt LR scheduler to PyTorch 1.1 API changes:
  - Change lr_scheduler.step() calls to succeed validate calls,
    during training
  - Pass to lr_scheduler.step() caller both loss and top1
    (Resolves issue #240)
* Adapt thinning for PyTorch 1.1 semantic changes
  - **KNOWN ISSUE**: When a thinning recipe is applied, in certain
    cases PyTorch displays this warning:
    "UserWarning: non-inplace resize is deprecated".
    To be fixed later
* SummaryGraph: Workaround for new scope name issue from PyTorch 1.1.0
* Adapt to updated PyTest version:
  - Stop using deprecated 'message' parameter of pytest.raises(),
    use pytest.fail() instead
  - Make sure only a single test case per pytest.raises context
* Move PyTorch version check to root __init__.py 
  - This means the version each checked when Distiller is first
    imported. A RuntimeError is raised if the version is wrong.
* Updates to parameter_histograms notebook:
  - Replace deprecated normed argument with density
  - Add sparsity rate to plot title
  - Load model in CPU

Unverified

032b1f74

Jul 03, 2019

Dump outputs in run log dir instead of script dir · 34f9a55b
Guy Jacob authored 5 years ago

34f9a55b

SummaryGraph: Changes in adjacency_map and predecessors/successors · 8cf7900d

Guy Jacob authored 5 years ago

* Add op name and type to adjacency map
* Make module name de-norm optional in predecessors and successor
  functions (inc. in _f variants)
* More tests

8cf7900d

Bugfix in normalize_module_name · 8e14ef0b

Guy Jacob authored 5 years ago

Previous code looked for the patterns 'module.' and '.module' separately
and removed the first instance.
Issues with this:
* Too broad. If a user gives some module a name that has the prefix or
  suffix 'module', that pre/suffix would be removed.
* Doesn't catch the corner case of the "root" module in a model

Modified to split name by the module separator '.', and then remove the
first instance of the name 'module'

8e14ef0b

Jul 01, 2019
- get_dummy_input: extend to return tuples of tensors + add tests · 498b3cb8
  Guy Jacob authored 5 years ago
  
  498b3cb8
Jun 23, 2019

Simulated BN fold module changes · 84240654

Guy Jacob authored 5 years ago

* Support case where BN module has no learnable parameters
  (affine == False)
* Support conv1d and conv3d

84240654

SummaryGraph: Add adjacency map + numerous changes (#291) · b60a33ef

Guy Jacob authored 5 years ago

* Adjacency map - map from each op to its predecessor and successor ops
* More robust handling of Gemm nodes scope names (instead of
  increment_instance())
* More consistent handling of ops with the same scope name
* Handle pad + avg pool sequences generated by ONNX trace optimization
  (results in one less op in the graph, hence the changes in tests)
* Minor refactoring in predecessors() and successors() functions

Unverified

b60a33ef

Simulated BN folding during training (module impl only) (#274) · 4c7d4890

Lev Zlotnik authored 5 years ago

Implementation of simulated BatchNorm folding as per
https://arxiv.org/pdf/1806.08342.pdf

* NOTE: This is just the module implementation for now, not yet integrated
  with QuantAwareTrainRangeLinearQuantizer

4c7d4890

Jun 03, 2019

[Breaking] PTQ: Removed special handling of clipping overrides · 3cde6c5e

Lev Zlotnik authored 5 years ago

* In PostTrainLinearQuantizer - moved 'clip_acts' and 'clip_n_stds'
  to overrides, removed 'no_clip_layers' parameter from __init__
* The 'no_clip_layers' command line argument REMAINS, handled in 
  PostTrainLinearQuantizer.from_args()
* Removed old code from comments, fixed warnings in 
  test_post_train_quant.py
* Updated tests
* Update post-train quant sample YAML

3cde6c5e

May 30, 2019

MNIST support · f8085cf4

Neta Zmora authored 5 years ago

-Added a test for MNIST
-Added classification_get_dummy_input() to apputils/data_loaders.py
and wrapped it with get_dummy_input() for (temporary) backward
compatibility.
- Changed simplenet_mnist so that it supports thinning

f8085cf4

May 27, 2019

Bug fix for shared module (#268) · d6efbe40

Lev Zlotnik authored 5 years ago

* Fixed bug where a shared module which was supposed to be skipped wasn't skipped on the second reference

* Added tests for new bug fix

Unverified

d6efbe40

May 20, 2019

Thinning: added support for group-wise convolutions · 6b832025

Neta Zmora authored 5 years ago

Group-wise convolutions with num_groups == num_in_channels, as
configured in MobileNet for example, create attribute and shape dependency
chains that are longer than convolutions with num_groups == 1.
For example in the sequence below, changing the number of filters of the
weights of Conv1, triggers changes in BN1, Conv2, BN2, and Conv3
(g indicates the number of groups):

Conv1(g=1) => BN1 ==> Conv2(g=32) ==> BN2 ==> Conv3(g=1)

Changing the number of filters used in Conv1 affects the parameters and
attributes of BN1 and Conv2 - The number of input channels of Conv2 is
changed.as explained in
https://nervanasystems.github.io/distiller/tutorial-struct_pruning.html.

However, since Conv2 has num_groups == num_in_channels, we have to change
num_groups, which triggers a change in num_out_channels.  This is akin to
changing the number of filters of Conv2, which triggers a change in BN2
and Conv3.

models/mobilenet.py:
Changed the code that flattens the output of the
features-extractors and prepares it as input to the classifier.
The code was written using hard-coded shape values, which made it
impossible to use in thinned models (where dimensions are changed).

tests/test_pruning.py:
Added test for thinning MobileNet (grouped convolutions)

6b832025

May 19, 2019
- Bugfix in bias handling in quant-aware training (fixes issue #248) · 4c163690
  Guy Jacob authored 5 years ago
  
  4c163690
May 16, 2019

Refactoring: utils.get_dummy_input() · bf1e6a0d
Neta Zmora authored 5 years ago
```
Remove the multiple instances of code that generates
dummy input per dataset.
```
bf1e6a0d
Fix test_onnx.py · af5c7219
Neta Zmora authored 5 years ago
```
A wrong model was used
```
af5c7219

Refactor export to ONNX functionality (#258) · 54304810

Bar authored 5 years ago

Introduced a new utility function to export image-classifiers
to ONNX: export_img_classifier_to_onnx.
The functionality is not new, just refactored.

In the sample application compress_classifier.py added 
--export-onnx as a stand-alone cmd-line flag for specifically exporting 
ONNX models.
This new flag can take an optional argument which is used to name the
exported onnx model file.
The option to export models was removed from the –summary argument.
Now we allow multiple --summary options be called together.

Added a basic test for exporting ONNX.

54304810

May 15, 2019

SummaryGraph: fix ‘weights_vol’ attribute for conv and linear layers · 08b5cd95

Neta Zmora authored 5 years ago

The weights_vol attribute reflects the size (volume) of an SG node’s
weights tensor.  The calculation of the weights volume was wrong.
This does not have any significant impact because this attribute is
not used.

08b5cd95

Revert "SummaryGraph: fix ‘weights_vol’ attribute for conv and linear layers" · a0ebeb7e
Neta Zmora authored 5 years ago
```
This reverts commit a3f2ce2d.
```
a0ebeb7e

SummaryGraph: fix ‘weights_vol’ attribute for conv and linear layers · a3f2ce2d

Neta Zmora authored 5 years ago

The weights_vol attribute reflects the size (volume) of an SG node’s
weights tensor.  The calculation of the weights volume was wrong.
This does not have any significant impact because this attribute is
not used.
wq

a3f2ce2d