Commits · cfbc3798e5db82ecafc950422e7dc24c3978a066 · llvm / distiller

Jan 16, 2019

compress_classifier.py refactoring (#126) · cfbc3798

Bar authored 6 years ago

* Support for multi-phase activations logging

Enable logging activation both durning training and validation at
the same session.

* Refactoring: Move parser to its own file

* Parser is moved from compress_classifier into its own file.
* Torch version check is moved to precede main() call.
* Move main definition to the top of the file.
* Modify parser choices to case-insensitive

cfbc3798

Jan 15, 2019
- Fix for CPU evaluation use-case · 81cb77d2
  Neta Zmora authored 6 years ago
  
  Fix a mismatch between the location of the model and the computation.
  81cb77d2
Jan 13, 2019
- compress_classifier.py: fix handling of --cpu application argument · 0edfb5a9
  Neta Zmora authored 6 years ago
  
  0edfb5a9
Jan 10, 2019

Enable compute (training/inference) on the CPU · 007b6903

Gal Novik authored 6 years ago

In compress_classifier.py we added a new application argument: --cpu
which you can use to force compute (training/inference) to run on the CPU 
when you invoke compress_classifier.py on a machine which has Nvidia GPUs.

If your machine lacks Nvidia GPUs, then the compute will now run on the CPU
(and you do not need the new flag).

Caveat: we did not fully test the CPU support for the code in the Jupyter 
notebooks.  If you find a bug, we apologize and appreciate your feedback.

007b6903

Dec 19, 2018

Bug fix: set the overall loss when not using a compression scheduler · f922973a

Neta Zmora authored 6 years ago

If compression_scheduler==None, then we need to set the value of
losses[OVERALL_LOSS_KEY] (so it is the same as losses[OBJECTIVE_LOSS_KEY]).
This was overlooked.

f922973a

Dec 16, 2018
- nits (#107) · f6c216db
  Taras Sereda authored 6 years ago
  
  f6c216db
Dec 14, 2018

AMC: more refactoring · 1ab288ae

Neta Zmora authored 6 years ago

Added notebook for visualizing the discovery of compressed networks.
Added one-epoch fine-tuning at the end of every episode, which is
required for very sensitive models like Plain20.

1ab288ae

Dec 11, 2018
- Earlyexit statsfix (#105) · f454fab9
  Haim Barad authored 6 years ago
  
  Revert back to Pytorch 0.4.0. Also fixed some numpy calls (for statistics) that needed to be moved back to CPU.
  f454fab9
- Save scheduler in quantize_eval checkpoint (#99) · c25d9ee2
  Yi-Syuan Chen authored 6 years ago
  
  c25d9ee2
Dec 06, 2018

Update compress_classifier.py (#97) · ecd9b139

Guangli Li authored 6 years ago

Update the examples of earlyexit arguments which were not consistent with descriptions

ecd9b139

Dec 04, 2018

Range-Based Linear Quantization Features (#95) · 907a6f04

Guy Jacob authored 6 years ago

* Asymmetric post-training quantization (only symmetric supported so until now)
* Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
* Per-channel quantization support in both training and post-training
* Added tests and examples
* Updated documentation

Unverified

907a6f04

Nov 24, 2018
- Fix un-handled exception traces showing twice in stdout · 30812b87
  Guy Jacob authored 6 years ago
  
  30812b87
Nov 22, 2018

Fix Issue 79 (#81) · acbb4b4d

Neta Zmora authored 6 years ago

* Fix issue #79

Change the default values so that the following scheduler meta-data keys
are always defined: 'starting_epoch', 'ending_epoch', 'frequency'

* compress_classifier.py: add a new argument

Allow the specification, from the command line arguments,  of the range of
pruning levels scanned when doing sensitivity analysis

* Add regression test for issue #79

Unverified

acbb4b4d

Nov 21, 2018
- Activation statistics collection: simplified the code · f3482e06
  Neta Zmora authored 6 years ago
  
  Trying to simplify the code.
  f3482e06
Nov 20, 2018

Bug fix: value of best_top1 stored in the checkpoint may be wrong (#77) · 6242afed

Neta Zmora authored 6 years ago

* Bug fix: value of best_top1 stored in the checkpoint may be wrong

If you invoke compress_clasifier.py with --num-best-scores=n
with n>1, then the value of best_top1 stored in checkpoints is wrong.

Unverified

6242afed

Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51

Neta Zmora authored 6 years ago

When we resume from a checkpoint, we usually want to continue using the checkpoint’s
masks.  I say “usually” because I can see a situation where we want to prune a model
and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
(https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
```compress_classifier.py``` for the time being – so we ignore DSD.

There are two possible situations when we resume a checkpoint that has a serialized
```CompressionScheduler``` with pruning masks:
1. We are planning on using a new ```CompressionScheduler``` that is defined in a
schedule YAML file.  In this case, we want to copy the masks from the serialized
```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
constructing from the YAML file.  This is one fix.
2. We are resuming a checkpoint, but without using a YAML schedule file.
In this case we want to use the ```CompressionScheduler``` that we loaded from the
checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
as we train, so that we don’t lose them.  This is the second fix.

For DSD, we would need a new flag that would override using the ```CompressionScheduler```
that we load from the checkpoint.

Unverified

78e98a51

Nov 08, 2018
- Early Exit docs (#75) · 470209b9
  Haim Barad authored 6 years ago
  
  * Updated stats computation - fixes issues with validation stats * Clarification of output (docs) * Update * Moved validation stats to separate function
  470209b9
- compress_classifier: Bug fix in naming checkpoint created at quantized eval · 2fb2d515
  Guy Jacob authored 6 years ago
  
  2fb2d515
Nov 06, 2018

compress_classifier.py: add an option to load a model in serialized mode · 11402988

Neta Zmora authored 6 years ago

By default, when we create a model we  wrap it with DataParallel to benefit
from data-parallelism across GPUs (mainly for convolution layers).

But sometimes we don't want the sample application to do this: for
example when we receive a model that was trained serially.
This commit adds a new argument to the application to prevent
the use of DataParallel.

11402988

Fixed validation stats and added new summary stats (#71) · 3876a912

Haim Barad authored 6 years ago

* Fixed validation stats and added new summary stats

* Trimmed some comments.

* Improved figure for documentation

* Minor updates

3876a912

Nov 05, 2018

Dynamic Network Surgery (#69) · 60a4f44a

Neta Zmora authored 6 years ago

Added an implementation of:

Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen.
NIPS 2016, https://arxiv.org/abs/1608.04493.

- Added SplicingPruner: A pruner that both prunes and splices connections.
- Included an example schedule on ResNet20 CIFAR.
- New features for compress_classifier.py:
1. Added the "--masks-sparsity" which, when enabled, logs the sparsity
of the weight masks during training.
2. Added a new command-line argument to report the top N
best accuracy scores, instead of just the highest score.
This is sometimes useful when pruning a pre-trained model,
that has the best Top1 accuracy in the first few pruning epochs.
- New features for PruningPolicy:
1. The pruning policy can use two copies of the weights: one is used during
the forward-pass, the other during the backward pass.
This is controlled by the “mask_on_forward_only” argument.
2. If we enable “mask_on_forward_only”, we probably want to permanently apply
the mask at some point (usually once the pruning phase is done).
This is controlled by the “keep_mask” argument.
3. We introduce a first implementation of scheduling at the training-iteration
granularity (i.e. at the mini-batch granularity). Until now we could schedule
pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency”
(disable by setting to zero).

Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler.
Need to reexamine this in the future.

Unverified

60a4f44a

Nov 01, 2018
- Averaging-based activations clipping in SymmetricLinearQuantizer (#56) · 68591870
  Guy Jacob authored 6 years ago
  
  * Added command line arguments for this and other post-training quantization settings in image classification sample.
  Unverified
  
  68591870
Oct 22, 2018

Activation statistics collection (#61) · 54a5867e

Neta Zmora authored 6 years ago

Activation statistics can be leveraged to make pruning and quantization decisions, and so
We added support to collect these data.
- Two types of activation statistics are supported: summary statistics, and detailed records 
per activation.
Currently we support the following summaries: 
- Average activation sparsity, per layer
- Average L1-norm for each activation channel, per layer
- Average sparsity for each activation channel, per layer

For the detailed records we collect some statistics per activation and store it in a record.  
Using this collection method generates more detailed data, but consumes more time, so
Beware.

* You can collect activation data for the different training phases: training/validation/test.
* You can access the data directly from each module that you chose to collect stats for.  
* You can also create an Excel workbook with the stats.

To demonstrate use of activation collection we added a sample schedule which prunes 
weight filters by the activation APoZ according to:
"Network Trimming: A Data-Driven Neuron Pruning Approach towards 
Efficient Deep Architectures",
Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
https://arxiv.org/abs/1607.03250

We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
and specifically we separated the AGP schedule from the filter pruning criterion.  We added
examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
and filter L1-norm (L1RankedStructureParameterPruner)

Unverified

54a5867e

Sep 20, 2018
- Export trained (image classification) models to ONNX · bc719c20
  Neta Zmora authored 6 years ago
  
  bc719c20
Sep 16, 2018

Clean up PyTorch 0.3 compatibility code (#46) · e749ea62

Neta Zmora authored 6 years ago

* Clean up PyTorch 0.3 compatibility code
We don't need this anymore and PyTorch 1.0 is just around the corner.

* explicitly place the inputs tensor on the GPU(s)

Unverified

e749ea62

Sep 03, 2018

Add knowledge distillation flow (#41) · c9794e4a

Guy Jacob authored 6 years ago

* Implemented as a Policy
* Integrated in image classification sample
* Updated docs and README

Unverified

c9794e4a

Aug 09, 2018

Generalize the loss value returned from before_backward_pass callbacks (#38) · a43b9f10

Guy Jacob authored 7 years ago

* Instead of a single additive value (which so far represented only the
  regularizer loss), callbacks return a new overall loss
* Policy callbacks also return the individual loss components used to
  calculate the new overall loss.
* Add boolean flag to the Scheduler's callback so applications can choose
  if they want to get individual loss components, or just the new overall
  loss
* In compress_classifier.py, log the individual loss components
* Add test for the loss-from-callback flow

Unverified

a43b9f10

Jul 31, 2018
- Early Exit (#32) · c2a42937
  Haim Barad authored 7 years ago
  
  Enabling Early Exit strategy in image classifier example
  c2a42937
Jul 25, 2018

compress_classifier.py: code refactoring · f772d952

Neta Zmora authored 7 years ago

We are using this file for more and more use-cases and we need to keep
it readable and clean.
I've tried to move code that is not in the main control-path to
specific functions.

f772d952

compress_classifier.py: changed the signature of validate() and test() · e46e196e

Neta Zmora authored 7 years ago

Due to the various uses of these functions, we need to pass an ever growing
number of arguments to these functions and the API is becoing bloated and
unstable.

Also added the option to log the confusion matrix.

e46e196e

Jul 22, 2018

PACT quantizer (#30) · df9a00ce

Gal Novik authored 7 years ago

* Adding PACT quantization method
* Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
* Updated documentation and tests

df9a00ce

Jul 17, 2018
- Classifier compression sample: Log the current best Top-1 after each epoch · 9374732d
  Guy Jacob authored 7 years ago
  
  9374732d
Jul 13, 2018

ADC (Automatic Deep Compression) example + features, tests, bug fixes (#28) · 718f777b

Neta Zmora authored 7 years ago

This is a merge of the ADC branch and master.
ADC (using a DDPG RL agent to compress image classifiers) is still WiP and requires
An unreleased version of Coach (https://github.com/NervanaSystems/coach).

Small features in this commit:
-Added model_find_module() - find module object given its name
- Add channel ranking and pruning: pruning/ranked_structures_pruner.py
- Add a CIFAR10 VGG16 model: models/cifar10/vgg_cifar.py
- Thinning: change the level of some log messages – some of the messages were
moved to ‘debug’ level because they are not usually interesting.
- Add a function to print nicely formatted integers - distiller/utils.py
- Sensitivity analysis for channels-removal
- compress_classifier.py – handle keyboard interrupts
- compress_classifier.py – fix re-raise of exceptions, so they maintain call-stack

-Added tests:
-- test_summarygraph.py: test_simplenet() - Added a regression test to target a bug that occurs when taking the predecessor of the first node in a graph
-- test_ranking.py - test_ch_ranking, test_ranked_channel_pruning
-- test_model_summary.py - test_png_generation, test_summary (sparsity/ compute/model/modules)

- Bug fixes in this commit:
-- Thinning bug fix: handle zero-sized 'indices' tensor
During the thinning process, the 'indices' tensor can become zero-sized,
and will have an undefiend length. Therefore, we need to check for this
situation when assessing the number of elements in 'indices'
-- Language model: adjust main.py to new distiller.model_summary API

Unverified

718f777b

Jul 11, 2018

More robust handling of data-parallel/serial graphs (#27) · b64be690

Neta Zmora authored 7 years ago

Remove the complicated logic trying to handle data-parallel models as
serially-processed models, and vice versa.

*Function distiller.utils.make_non_parallel_copy() does the heavy lifting of
replacing  all instances of nn.DataParallel in a model with instances of
DoNothingModuleWrapper.
The DoNothingModuleWrapper wrapper does nothing but forward to the
wrapped module.  This is a trick we use to transform a data-parallel model
to a serial-processed model.

*SummaryGraph uses a copy of the model after the model is processed by
distiller.make_non_parallel_copy() which renders the model non-data-parallel.

*The same goes for model_performance_summary()

*Model inputs are explicitly placed on the Cuda device, since now all models are
Executed on the CPU.  Previously, if a model was not created using
nn.DataParallel, then the model was not explicitly placed on the Cuda device.

*The logic in distiller.CompressionScheduler that attempted to load a
model parallel model and process it serially, or load a serial model and
process it data-parallel, was removed.  This removes a lot of fuzziness and makes
the code more robust: we do not needlessly try to be heroes.

* model summaries - remove pytorch 0.4 warning

* create_model: remove redundant .cuda() call

* Tests: support both parallel and serial tests

Unverified

b64be690

Jun 30, 2018

Code cleanup: PEP8 and dead code removal for compress_classifier.py · 5b9bec84
Neta Zmora authored 7 years ago

5b9bec84

Bug fix: add support for thinning the optimizer · b21f449b

Neta Zmora authored 7 years ago

You no longer need to use —momentum=0 when removing structures
dynamically.
The SGD momentum update (velocity) is dependent on the weights, which
PyTorch optimizers cache internally.  This caching is not a problem for
filter/channel removal (thinning) because although we dynamically
change the shapes of the weights tensors, we don’t change the weights
tensors themselves.
PyTorch’s SGD creates tensors to store the momentum updates, and these
tensors have the same shape as the weights tensors.  When we change the
weights tensors, we need to make the appropriate changes in the Optimizer,
or disable the momentum.
We added a new function - thinning.optimizer_thinning() - to do this.
This function is brittle as it is tested only on optim.SGD and relies on the
internal representation of the SGD optimizer, which can change w/o notice.
For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
Which also depend the shape of the weight tensors.
We needed to pass the Optimizer instance to Thinning policies
(ChannelRemover, FilterRemover) via the callbacks, which required us
to change the callback interface.
In the future we plan a bigger change to the callback API, to allow
passing of arbitrary context from the training environment to Distiller.

Also in this commit:
* compress_classifier.py had special handling for resnet layer-removal, which
is used in examples/ssl/ssl_4D-removal_training.yaml.
This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
Removing support for layer-removal.
* Added to the tests invocation of forward and backward passes over a model.
This tests more of the real flows, which use the optimizer and construct
gradient tensors.
* Added a test of a special case of convolution filter-pruning which occurs
when the next layer is fully-connected (linear)

b21f449b

Jun 21, 2018
- Training with quantization (#8) · 5bb9e138
  Guy Jacob authored 7 years ago
  
  Unverified
  
  5bb9e138
Jun 19, 2018

Make PNG summary compatible with latest SummaryGraph class changes (#7) · 9e57219e

Guy Jacob authored 7 years ago

* Modify 'create_png' to use the correct data structures (dicts instead
  lists, etc.)
* Handle case where an op was called not from a module. This relates to:
  * ONNX->"User-Friendly" name conversion to account for cases where
  * Detection of existing op with same name
  In both cases use the ONNX op type in addition to the op name
* Return an "empty" shape instead of None when ONNX couldn't infer
  a parameter's shape
* Expose option of PNG summary with parameters to user

Unverified

9e57219e

May 17, 2018

Fix system tests failure · a7ed8cad

Neta Zmora authored 7 years ago

The latest changes to the logger caused the CI tests to fail,
because test assumes that the logging.conf file is present in the
same directory as the sample application script.
The sample application used cwd() instead, and did not find the
log configuration file.

a7ed8cad

May 16, 2018
- refactoring: move config_pylogger out of the sample app · 792e9e39
  Neta Zmora authored 7 years ago
  
  Soon we will be reusing this function in other sample apps, so let's move it to app_utils.
  792e9e39