Commits · df9a00ce84577cc972ead5a02833d09d656cbd1a · llvm / distiller

Jul 22, 2018

Gal Novik authored 6 years ago

* Adding PACT quantization method
* Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
* Updated documentation and tests

df9a00ce

Jul 21, 2018

Mag pruner doc (#33) · 9f0c0832

Neta Zmora authored 6 years ago

MagnitudeParameterPruner: document and test

This is in response to a question in issue #19

Unverified

9f0c0832

Jul 19, 2018
- Fix PyTorch version mentionen in README.md · d97786ee
  Guy Jacob authored 6 years ago
  
  Unverified
  
  d97786ee
Jul 17, 2018

Classifier compression sample: Log the current best Top-1 after each epoch · 9374732d
Guy Jacob authored 6 years ago

9374732d

Quantizer tests, fixes and docs update · 6b166cec

Guy Jacob authored 6 years ago

* Add Quantizer unit tests
* Require 'bits_overrides' to be OrderedDict to support overlapping
  patterns in a predictable manner + update documentation to reflect this
* Quantizer class cleanup
  * Use "public" nn.Module APIs instead of protected attributes
  * Call the builtins set/get/delattr instead of the class special methods
    (__***__)
  * Fix issues reported in #24
* Bug in RangeLinearQuantParamLayerWrapper - add explicit override of
  pre_quantized_forward accpeting single input (#15)
* Add DoReFa test to full_flow_tests

6b166cec

Jul 15, 2018

model_summaries.py: remove "experimental" warning · e7dd1a56
Neta Zmora authored 6 years ago
```
This is now tested and supported when using CNNs and PyTorch 0.4
```
Unverified

e7dd1a56

Thinning: bug fixes · b48908c3

Neta Zmora authored 6 years ago

There are two different “namespaces” referring to module names:
normalized and de-normalized.
Normalized module names are module names that have the same
format for both data-parallel and data-serial models.
De-normalized module names are the “raw” PyTorch module names
that reflect the full model graph.  So if there is a container module
such as nn.DataParallel in the model, then a sub-module’s name
will have the “module” substring somewhere in it.

SummaryGraph operates by converting the PyTorch to ONNX, and
I’ve have issues handling nn.DataParallel in this process.
Therefore, SummaryGraph uses only normalized names internally.

PruningRecipe, on the other hand, uses de-normalized names
because it needs to operate on the model itself.

This is a sticky situation that can create really annoying bugs and makes
for some ugly code.  Nonetheless, this is the best I can do right now,
and I’ll probably revisit this soon to make it nicer.
For now, I’m pushing this commit that fixes the distinction between the
two namespaces, and fixes related bugs – in the hope that it is not too
brittle.

append_module_directive – now uses denormalize_module_name to
ensure recipe module names are denormalized.

append_param_directive – because we are dealing with parameters,
I can’t use denormalize_module_name as easily as in append_module_directive.
The clean solution is kept for later :-(

b48908c3

distiller/model_summaries.py: remove unused import · f5791422
Neta Zmora authored 6 years ago

f5791422
apputils/model_summaries.py: cleanup PEP8 warnings · 7b3ab5ef
Neta Zmora authored 6 years ago
```
Also add a warnning when swe can't find a node whose
predecessors we're looking for.
```
7b3ab5ef

Jul 13, 2018

Wiki: add a graph of the word-level language performance · bea11898
Neta Zmora authored 6 years ago

bea11898

ADC (Automatic Deep Compression) example + features, tests, bug fixes (#28) · 718f777b

Neta Zmora authored 6 years ago

This is a merge of the ADC branch and master.
ADC (using a DDPG RL agent to compress image classifiers) is still WiP and requires
An unreleased version of Coach (https://github.com/NervanaSystems/coach).

Small features in this commit:
-Added model_find_module() - find module object given its name
- Add channel ranking and pruning: pruning/ranked_structures_pruner.py
- Add a CIFAR10 VGG16 model: models/cifar10/vgg_cifar.py
- Thinning: change the level of some log messages – some of the messages were
moved to ‘debug’ level because they are not usually interesting.
- Add a function to print nicely formatted integers - distiller/utils.py
- Sensitivity analysis for channels-removal
- compress_classifier.py – handle keyboard interrupts
- compress_classifier.py – fix re-raise of exceptions, so they maintain call-stack

-Added tests:
-- test_summarygraph.py: test_simplenet() - Added a regression test to target a bug that occurs when taking the predecessor of the first node in a graph
-- test_ranking.py - test_ch_ranking, test_ranked_channel_pruning
-- test_model_summary.py - test_png_generation, test_summary (sparsity/ compute/model/modules)

- Bug fixes in this commit:
-- Thinning bug fix: handle zero-sized 'indices' tensor
During the thinning process, the 'indices' tensor can become zero-sized,
and will have an undefiend length. Therefore, we need to check for this
situation when assessing the number of elements in 'indices'
-- Language model: adjust main.py to new distiller.model_summary API

Unverified

718f777b

Jul 11, 2018

load_checkpoint: replace exit() with exception and add test · 2bb90a9a

Neta Zmora authored 6 years ago

- Raise IOError instead of crude exit() when file is not found in the file-system
- Test that the correct exception is raised when opening a non-existent
checkpoint file

2bb90a9a

Extend pruning tests to parallel models · e3e41ba6
Neta Zmora authored 6 years ago

e3e41ba6

More robust handling of data-parallel/serial graphs (#27) · b64be690

Neta Zmora authored 6 years ago

Remove the complicated logic trying to handle data-parallel models as
serially-processed models, and vice versa.

*Function distiller.utils.make_non_parallel_copy() does the heavy lifting of
replacing  all instances of nn.DataParallel in a model with instances of
DoNothingModuleWrapper.
The DoNothingModuleWrapper wrapper does nothing but forward to the
wrapped module.  This is a trick we use to transform a data-parallel model
to a serial-processed model.

*SummaryGraph uses a copy of the model after the model is processed by
distiller.make_non_parallel_copy() which renders the model non-data-parallel.

*The same goes for model_performance_summary()

*Model inputs are explicitly placed on the Cuda device, since now all models are
Executed on the CPU.  Previously, if a model was not created using
nn.DataParallel, then the model was not explicitly placed on the Cuda device.

*The logic in distiller.CompressionScheduler that attempted to load a
model parallel model and process it serially, or load a serial model and
process it data-parallel, was removed.  This removes a lot of fuzziness and makes
the code more robust: we do not needlessly try to be heroes.

* model summaries - remove pytorch 0.4 warning

* create_model: remove redundant .cuda() call

* Tests: support both parallel and serial tests

Unverified

b64be690

Jul 09, 2018

Fix issue #26 · 51a7df35

Neta Zmora authored 6 years ago

The checkpoint file:
examples/ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar
did not contain the "thinning recipe" while the weight tensor stored within the
checkpoint file have already been shrunk/thinned and this caused a mismatch.

PyTorch models are defined in code. This includes the network architecture and
connectivity (which layers are used and what is the forward path), but also
the sizes for the parameter tensors and input/outputs.
When the model is created the parameter tensors are also created, as defined
or inferred from the code.
When a checkpoint is loaded, they parameter tensors are read from the checkpoint and
copied to the model's tensors. Therefore, the tensors in the checkpoint and
in the model must have the same shape. If a model has been "thinned" and saved to
a checkpoint, then the checkpoint tensors are "smaller" than the ones defined by
the model. A "thinning recipe" is used to make changes to the model before copying
the tensors from the checkpoint.
In this case, the "thinning recipe" was missing.

51a7df35

Jul 08, 2018

Bug fix in connectivity_summary; extend the API of create_png() · af4bf3dc

Neta Zmora authored 6 years ago

*connectivity_summary() does not use SummaryGraph correctly:
 Recently we changed the internal representation of SummaryGraph.ops, but
 connectivity_summary() and connectivity_summary_verbose() were not updated.
 Fixed that.

*Extend the API of create_png():
 Add to the signature of create_png() and create_pydot_graph() rankdir and
 External styles.  These are explained in the docstrings.

*Added documentation to the PNG drawing functions

*Added tests to catch trivial connectivity_summary() bugs

af4bf3dc

Allow quantize_bias to work in WRPNQuantizer (#25) · c14efaa9
Robert Muchsel authored 6 years ago

c14efaa9

Jul 05, 2018
- Allow quantize_bias to work in DorefaQuantizer and checkpoints (#17) · 2bb9689f
  Robert Muchsel authored 6 years ago
  
  2bb9689f
- Changes to weights sparsity summary (#20) · e7c7d94f
  Guy Jacob authored 6 years ago
  
  * More strict and explicit check for the parameter's type in weights_sparsity_summary * Expose 'param_dims' in weights_sparsity_tbl_summary as well * Some PEP8 related fixes
  Unverified
  
  e7c7d94f
Jul 03, 2018
- Update README.md · 71178e60
  Neta Zmora authored 6 years ago
  
  Add a link to the Wiki
  Unverified
  
  71178e60
Jul 01, 2018
- Fix symmetric linear quantization math derivation in docs (#12) · 19d33c50
  Guy Jacob authored 6 years ago
  
  * Scale of bias and parentheses were wrong
  Unverified
  
  19d33c50
Jun 30, 2018

Code cleanup: PEP8 and dead code removal for compress_classifier.py · 5b9bec84
Neta Zmora authored 6 years ago

5b9bec84

Bug fix: add support for thinning the optimizer · b21f449b

Neta Zmora authored 6 years ago

You no longer need to use —momentum=0 when removing structures
dynamically.
The SGD momentum update (velocity) is dependent on the weights, which
PyTorch optimizers cache internally.  This caching is not a problem for
filter/channel removal (thinning) because although we dynamically
change the shapes of the weights tensors, we don’t change the weights
tensors themselves.
PyTorch’s SGD creates tensors to store the momentum updates, and these
tensors have the same shape as the weights tensors.  When we change the
weights tensors, we need to make the appropriate changes in the Optimizer,
or disable the momentum.
We added a new function - thinning.optimizer_thinning() - to do this.
This function is brittle as it is tested only on optim.SGD and relies on the
internal representation of the SGD optimizer, which can change w/o notice.
For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
Which also depend the shape of the weight tensors.
We needed to pass the Optimizer instance to Thinning policies
(ChannelRemover, FilterRemover) via the callbacks, which required us
to change the callback interface.
In the future we plan a bigger change to the callback API, to allow
passing of arbitrary context from the training environment to Distiller.

Also in this commit:
* compress_classifier.py had special handling for resnet layer-removal, which
is used in examples/ssl/ssl_4D-removal_training.yaml.
This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
Removing support for layer-removal.
* Added to the tests invocation of forward and backward passes over a model.
This tests more of the real flows, which use the optimizer and construct
gradient tensors.
* Added a test of a special case of convolution filter-pruning which occurs
when the next layer is fully-connected (linear)

b21f449b

Jun 29, 2018
- Bug fix: compression schedule configuration parsing · c6b1dbb5
  Neta Zmora authored 6 years ago
  
  Used the wrong indentation when parsing RegularizationPolicy
  c6b1dbb5
- Bug fix: compression schedule configuration parsing · 73df57bf
  Neta Zmora authored 6 years ago
  
  Used the wrong indentation when parsing RegularizationPolicy
  73df57bf
Jun 26, 2018

Model thinning: refactor pruning tests · 4240ec94
Neta Zmora authored 6 years ago
```
Refactor the tests so that they can be applied to more models.
```
4240ec94
Model thinning: bug fix – properly handle thinning of Convs with biases · 0fd40970
Neta Zmora authored 6 years ago
```
The channel-thinning code does not handle correctly channel removal when
the Convolution layer has a biases tensor.
```
0fd40970

Model thinning: bug fix – aggressive channel/filter pruning raises an exception · a1cf9595

Neta Zmora authored 6 years ago

* Fix bug: taking the len() of a zero-dimensional ‘indices’ tensor is not legal.
    Use nelement() instead.
    A zero-dim ‘indices’ tensor occurs when the pruning is very aggressive and
    leaves one channel or filter in the tensor.
* Protect again pruning of all channels/filters of a layer: Raise ValueError if
  trying to create (thru thinning) a Convolution layer with zero channels or filters.
* Tests:
* Some PEP8 cleanup.
* Add some test documentation.
* Refactored some test code to tests/common.py
* Added testing of pruning all the channels/filters in a Convolution

a1cf9595

Jun 25, 2018
- Update README.md · 94d3d518
  Gal Novik authored 6 years ago
  
  Unverified
  
  94d3d518
- Update __init__.py · 93341383
  Neta Zmora authored 6 years ago
  
  Unverified
  
  93341383
- Changed the version from: "0.2.0-pre" to "0.2.0" · 2630cd19
  Neta Zmora authored 6 years ago
  
  View commits for tag v0.2.0 v0.2.0 Unverified
  
  2630cd19
Jun 22, 2018

DOC: Fix (#10) · ab35fed7

Thomas Fan authored 6 years ago

Reviewed and looking good.  We have to set a convention for naming files.

ab35fed7

Jun 21, 2018
- Clarifying comments in quantization YAMLs · a029c9b0
  Guy Jacob authored 6 years ago
  
  a029c9b0
- Add missing documentation and missing input check in MultiStepMultiGammaLR · fb44c8d0
  Guy Jacob authored 6 years ago
  
  fb44c8d0
- Minor additions to docs · 02f7871b
  Guy Jacob authored 6 years ago
  
  02f7871b
- Updated docs related to quantization · 3658374e
  Guy Jacob authored 6 years ago
  
  3658374e
- Update README.md · 2b653fdc
  Guy Jacob authored 6 years ago
  
  Unverified
  
  2b653fdc
- Training with quantization (#8) · 5bb9e138
  Guy Jacob authored 6 years ago
  
  Unverified
  
  5bb9e138
- Module name normalization: bug fixes, refactoring, tests · c59d2d51
  Neta Zmora authored 6 years ago
  
  Fixed a bug in module name normalization, for modules with a name ending in ".module" (e.g. "features.module" in the case of VGG). Made the tests more robust, and also refactored the common code to distiller/utils.py
  c59d2d51
Jun 19, 2018

Make PNG summary compatible with latest SummaryGraph class changes (#7) · 9e57219e

Guy Jacob authored 6 years ago

* Modify 'create_png' to use the correct data structures (dicts instead
  lists, etc.)
* Handle case where an op was called not from a module. This relates to:
  * ONNX->"User-Friendly" name conversion to account for cases where
  * Detection of existing op with same name
  In both cases use the ONNX op type in addition to the op name
* Return an "empty" shape instead of None when ONNX couldn't infer
  a parameter's shape
* Expose option of PNG summary with parameters to user

Unverified

9e57219e