Commits · 54a5867e8d810e5938f5186a6fc1ae89e4f5b041 · llvm / distiller

Oct 22, 2018

Activation statistics collection (#61) · 54a5867e

Neta Zmora authored 6 years ago

Activation statistics can be leveraged to make pruning and quantization decisions, and so
We added support to collect these data.
- Two types of activation statistics are supported: summary statistics, and detailed records 
per activation.
Currently we support the following summaries: 
- Average activation sparsity, per layer
- Average L1-norm for each activation channel, per layer
- Average sparsity for each activation channel, per layer

For the detailed records we collect some statistics per activation and store it in a record.  
Using this collection method generates more detailed data, but consumes more time, so
Beware.

* You can collect activation data for the different training phases: training/validation/test.
* You can access the data directly from each module that you chose to collect stats for.  
* You can also create an Excel workbook with the stats.

To demonstrate use of activation collection we added a sample schedule which prunes 
weight filters by the activation APoZ according to:
"Network Trimming: A Data-Driven Neuron Pruning Approach towards 
Efficient Deep Architectures",
Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
https://arxiv.org/abs/1607.03250

We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
and specifically we separated the AGP schedule from the filter pruning criterion.  We added
examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
and filter L1-norm (L1RankedStructureParameterPruner)

54a5867e

Jupyter: new noebook that summarizes the word language model results · ab02df45
Neta Zmora authored 6 years ago

ab02df45

Oct 18, 2018

Bug fix: exporting Alexnet and VGG models to ONNX · b40dff5e

Neta Zmora authored 6 years ago

ONNX export in PyTorch doesn't know how to handle DataParallel
layers, so we need to make sure that we remove all instances
of nn.DataParallel from the model before exporting it.

The previous ONNX implementation forgot to deal with the case
of DataParallel layers that do not wrap the entire model (as
in VGG, where only the feature-extractor layers are data-parallel).

b40dff5e

Bug fix: remove softmax layer from model loading code · 0a8a3b31

Neta Zmora authored 6 years ago

We should only add softmax when we explicitly require it (as when
exporting to ONNX), because CrossEntropyLoss implicitly computes
softmax on the logits it receives as input.

This cade was left there by mistake and should have never been
pushed to git.

0a8a3b31

Oct 13, 2018

AGP for structs (#58) · e0bfc796

Neta Zmora authored 6 years ago

Using the automatic gradual pruning for structured-pruning is very
simple to use and produces good results.
It is implemented in Google's TensorFlow framework.

e0bfc796

ONNX export: add Softmax layer to the end of image classifiers (#57) · e68433c4
Neta Zmora authored 6 years ago
```
When running inference in ONNX, we often want to add a softmax
layer to TorchVision's models.
```
e68433c4

Oct 11, 2018

Fix Issue #53 (#55) · 608af2b4

Neta Zmora authored 6 years ago

When using a schedule with epochs that have nothing scheduled for them, apply_mask() is not invoked at the end of mini-batches, and pruned weights might be unmasked by the optimizer weight updates.

See explanation in issue #53 discussion

608af2b4

Oct 04, 2018

inspect_ckpt.py: temporary fix · 14531cbf

Neta Zmora authored 6 years ago

Temporary fix for dependency on distiller class hierarchy when
serializing a model that contains a thinning recipe.

14531cbf

Oct 03, 2018

Screen captures for new documentation in Wiki · 7a97f229
Neta Zmora authored 6 years ago
```
See: https://github.com/NervanaSystems/distiller/wiki/Distiller-Summary-Reports
```
7a97f229

documentation: update syntax of launching jupyter notebook · 5902146a

Neta Zmora authored 6 years ago

Latest versions of Jupyter notebooks have a different syntax for
launching the server such that it listens on oll network interfaces
(this is useful if you are running the Jupyter server on one machine,
and connect to it from a browser on a different machine).

So:
	jupyter-notebook --ip=* --no-browser

is replaced by:
	jupyter-notebook --ip=0.0.0.0 --no-browser

5902146a

utils.py: remove unused code · f39e9c11
Neta Zmora authored 6 years ago
```
Remove function to_var() which is not used by any code.
```
f39e9c11
requirements.txt: update Torchnet to latest version (0.0.4) · aacc1f33
Neta Zmora authored 6 years ago
```
We need AverageValueMeter's support for numpy arrays.
```
aacc1f33
Jupyter notebooks: Add details to the Performance notebook · aba6e49b
Neta Zmora authored 6 years ago
```
Showing various details about the performance of ResNet50
```
aba6e49b
Jupyter notebook: new notebook for drawing weight distributions · 09dc2f25
Neta Zmora authored 6 years ago
```
Also show fitting of the histograms to the Gaussian and Laplacian
distributions.
```
09dc2f25

Oct 01, 2018
- ResNet50 pruning: added a schedule to prune ResNet50 to 70% · 8a869bec
  Neta Zmora authored 6 years ago
  
  8a869bec
Sep 29, 2018
- test_summarygraph.py: remove dead code · 7193c526
  Neta Zmora authored 6 years ago
  
  Somehow 4 copies of the same test were pasted into this file: removed 3 instances.
  7193c526
Sep 26, 2018

Attention-Based Guided Structured Sparsity (GSS) (#51) · 65919dc0

Neta Zmora authored 6 years ago

* Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning.
    - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance.
    - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups).
    - When tested on ResNet 20 GSS did not show any improvement over SSL

* Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks)

* Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR

* Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.

65919dc0

Sep 21, 2018
- Bugfix: Conflicting argument names when resuming SymmetricLinearQuantizer (#50) · 06f2b065
  Yi-Syuan Chen authored 6 years ago
  
  06f2b065
Sep 20, 2018

ONNX export · 37c14328
Neta Zmora authored 6 years ago
```
clean-up a bit of code
```
37c14328
Export trained (image classification) models to ONNX · bc719c20
Neta Zmora authored 6 years ago

bc719c20

ResNet20 SSL: x1.8 compute reduction while sacrificing ~0.8 Test Top1 accuracy · 17242204

Neta Zmora authored 6 years ago

In this experiment we increase the regulization_strength of some of the
channel regularization terms.
We want to increase the compute compression, while allowing some reduction
in the accuracy performance.

17242204

ResNet50 regularization using pruning · 57e6bd00

Neta Zmora authored 6 years ago

This schedule demonstrates low-rate pruning (26% sparsity) acting as a
regularizer to reduce the generalization error of ResNet50 using the
ImageNet dataset.

We improve the ResNet50 Top1 test error by 0.4% (23.462 vs 23.85).
Top5 is improved as well: 6.82 error vs. 7.13 error in the baseline

57e6bd00

Sep 16, 2018

Clean up PyTorch 0.3 compatibility code (#46) · e749ea62

Neta Zmora authored 6 years ago

* Clean up PyTorch 0.3 compatibility code
We don't need this anymore and PyTorch 1.0 is just around the corner.

* explicitly place the inputs tensor on the GPU(s)

e749ea62

A temporary fix for issue #36 (#48) · 5d3d6d8d

Neta Zmora authored 6 years ago

* A temporary fix for issue 36

The thinning code assumes that the sgraph it is using
is not data-parallel, because it (currently) accesses the
layer-name keys using a "normalized" name ("module." is removed).

The bug is that in thinning.py#L73 we create a data_parallel=True
model; and then give it to sgraph.
But in other places thinning code uses "normalized" keys.  For
example in thinning.py#L264.

The temporary fix configures data_parallel=False in thinning.py#L73.

A long term solution should have SummaryGraph know how to handle
both parallel and not-parallel models.  This can be done by having
SummaryGraph convert layer-names it receives in the API to
data_parallel=False using normalize_layer_name.  When returning
results, use the de-normalized format.

* Fix the documentation error from issue 36
* Move some logs to debug and show in logging.conf how to enable DEBUG logs.

5d3d6d8d

A small stand-alone utility to inspect the contents of checkpoint files (#47) · 8e2b6a29
Neta Zmora authored 6 years ago

8e2b6a29

Sep 03, 2018

Add knowledge distillation flow (#41) · c9794e4a

Guy Jacob authored 6 years ago

* Implemented as a Policy
* Integrated in image classification sample
* Updated docs and README

c9794e4a

Aug 29, 2018
- Jupyter notebooks: fix Python 2.x/3.x syntax compatibility · df74040e
  Neta Zmora authored 6 years ago
  
  df74040e
Aug 27, 2018
- Jupyter notebook: Fix PyTorch 0.4 compatability issue when rendering · a89dfe6b
  Neta Zmora authored 6 years ago
  
  Sometimes the gmin/gmax in group color-normalization ends up with a zero dimensional tensor, which needs to be accessed using .item()
  a89dfe6b
- Fix PyTorch 0.4 compatability issue · aa8862bd
  Neta Zmora authored 6 years ago
  
  Sometimes the gmin/gmax in group color-normalization ends up with a zero dimensional tensor, which needs to be accessed using .item()
  aa8862bd
- Update ADC to newest Coach APIs · ac1235a5
  Neta Zmora authored 6 years ago
  
  ac1235a5
Aug 09, 2018

Generalize the loss value returned from before_backward_pass callbacks (#38) · a43b9f10

Guy Jacob authored 6 years ago

* Instead of a single additive value (which so far represented only the
  regularizer loss), callbacks return a new overall loss
* Policy callbacks also return the individual loss components used to
  calculate the new overall loss.
* Add boolean flag to the Scheduler's callback so applications can choose
  if they want to get individual loss components, or just the new overall
  loss
* In compress_classifier.py, log the individual loss components
* Add test for the loss-from-callback flow

a43b9f10

Aug 07, 2018

Fix bug: thresholding matrix cols should use dim=0 (issue #39) (#40) · 89da7ce5

Neta Zmora authored 6 years ago

* Fix bug: thresholding matrix cols should use dim=0 (issue #39)

See issue #39 for a description of the bug from @vinutah.

* thresholding test: fix device assignment

89da7ce5

Jul 31, 2018
- Early Exit (#32) · c2a42937
  Haim Barad authored 6 years ago
  
  Enabling Early Exit strategy in image classifier example
  c2a42937
Jul 25, 2018

compress_classifier.py: code refactoring · f772d952

Neta Zmora authored 6 years ago

We are using this file for more and more use-cases and we need to keep
it readable and clean.
I've tried to move code that is not in the main control-path to
specific functions.

f772d952

ADC: refactor the code · 85301d50
Neta Zmora authored 6 years ago
```
Aslo added a script to analyze model-spaces.
```
85301d50

create_model_masks_dict: added create_model_masks_dict · c9abf1f9

Neta Zmora authored 6 years ago

This is a convinence function used by customers of the scheduler,
and might change location in the future.

c9abf1f9

compress_classifier.py: changed the signature of validate() and test() · e46e196e

Neta Zmora authored 6 years ago

Due to the various uses of these functions, we need to pass an ever growing
number of arguments to these functions and the API is becoing bloated and
unstable.

Also added the option to log the confusion matrix.

e46e196e

Jul 22, 2018
- Update README.md · 4194aa77
  Gal Novik authored 6 years ago
  
  4194aa77
- ADC: still WiP · 11490f6f
  Neta Zmora authored 6 years ago
  
  11490f6f
- examples: add VGG16-Cifar SSL training example · 78433542
  Neta Zmora authored 6 years ago
  
  78433542