Commits · 410a059bb1a9a2db40b3a89aaa45f08a9acf0120 · llvm / distiller

Aug 06, 2019

AMC and other refactoring - large merge (#339) · 02054da1

Neta Zmora authored 5 years ago

*An implementation of AMC (the previous implementation
 code has moved to a new location under 
/distiller/examples/auto_compression/amc.  AMC is aligned
with the ‘master’ branch of Coach.
*compress_classifier.py is refactored.  The base code moved
to /distiller/apputils/image_classifier.py.  Further refactoring
will follow.
We want to provide a simple and small API to the basic features of
a classifier-compression application.
This will help applications that want to use the make features of a
classifier-compression application, without the standard training
regiment.
AMC is one example of a stand-alone application that needs to leverage
the capabilities of a classifier-compression application, but is currently
coupled to `compress_classifier.py`.
`multi-finetune.py` is another example.
* ranked_structures_pruner.py:
** Added support for grouping channels/filters
Sometimes we want to prune a group of structures: e.g. groups of
8-channels.  This feature does not force the groups to be adjacent,
so it is more like a set of structures.  E.g. in the case of pruning
channels from a 64-channels convolution, grouped by 8 channels, we 
will prune exactly one of 0/8/16/24/32/40/48/56 channels.  I.e. 
always a multiple of 8-channels, excluding the set of all 64 channels.
** Added FMReconstructionChannelPruner – this is channel
pruning using L1-magnitude to rank and select channels to
remove, and feature-map reconstruction to improve the
resilience to the pruning.
* Added a script to run multiple instances of an 
experiment, in different processes:
 examples/classifier_compression/multi-run.py
* Set the seed value even when not specified by the command-line
arguments, so that we can try and recreate the session.
* Added pruning ranking noise -
Ranking noise introduces Gaussian noise when ranking channels/filters
using Lp-norm.  The noise is introduced using the epsilon-greedy
methodology, where ranking using exact Lp-norm is considered greedy.
* Added configurable rounding of pruning level: choose whether to 
Round up/down when rounding the number of structures to prune 
(rounding is always to an integer).

Unverified

02054da1

Apr 01, 2019

Load optimizer from checkpoint (BREAKING - see details) (#182) · 992291cf

Bar authored 6 years ago

Load optimizer from checkpoint (BREAKING - see details) (#182)

* Fixes issues #70, #145 and replaces PR #74
* checkpoint.py
  * save_checkpoint will now save the optimizer type in addition to
    its state
  * load_checkpoint will now instantiate an optimizer based on the
    saved type and load its state
* config.py: file/dict_config now accept the resumed epoch to pass to
  LR schedulers
* policy.py: LRPolicy now passes the current epoch to the LR scheduler
* Classifier compression sample
  * New flag '--resume-from' for properly resuming a saved training
    session, inc. optimizer state and epoch #
  * Flag '--reset-optimizer' added to allow discarding of a loaded
    optimizer.
  * BREAKING:
    * Previous flag '--resume' is deprecated and is mapped to
      '--resume-from' + '--reset-optimizer'. 
    * But, old resuming behavior had an inconsistency where the epoch
      count would continue from the saved epoch, but the LR scheduler
      was setup as if we were starting from epoch 0.
    * Using '--resume-from' + '--reset-optimizer' now will simply
      RESET the epoch count to 0 for the whole environment.
    * This means that scheduling configurations (in YAML or code)
      which assumed use of '--resume' might need to be changed to
      reflect the fact that the epoch count now starts from 0
    * All relevant YAML files under 'examples' modified to reflect
      this change
* Initial support for ReduceLROnPlateau (#161):
  * Allow passing **kwargs to policies via the scheduler
  * Image classification now passes the validation loss to the
    scheduler, to be used yo ReduceLROnPlateau
  * The current implementation is experimental and subject to change

992291cf

Sep 26, 2018

Attention-Based Guided Structured Sparsity (GSS) (#51) · 65919dc0

Neta Zmora authored 6 years ago

* Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning.
    - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance.
    - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups).
    - When tested on ResNet 20 GSS did not show any improvement over SSL

* Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks)

* Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR

* Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.

Unverified

65919dc0

Sep 20, 2018

ResNet20 SSL: x1.8 compute reduction while sacrificing ~0.8 Test Top1 accuracy · 17242204

Neta Zmora authored 6 years ago

In this experiment we increase the regulization_strength of some of the
channel regularization terms.
We want to increase the compute compression, while allowing some reduction
in the accuracy performance.

17242204

Jul 22, 2018
- examples: add VGG16-Cifar SSL training example · 78433542
  Neta Zmora authored 6 years ago
  
  78433542
Jul 09, 2018

Fix issue #26 · 51a7df35

Neta Zmora authored 6 years ago

The checkpoint file:
examples/ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar
did not contain the "thinning recipe" while the weight tensor stored within the
checkpoint file have already been shrunk/thinned and this caused a mismatch.

PyTorch models are defined in code. This includes the network architecture and
connectivity (which layers are used and what is the forward path), but also
the sizes for the parameter tensors and input/outputs.
When the model is created the parameter tensors are also created, as defined
or inferred from the code.
When a checkpoint is loaded, they parameter tensors are read from the checkpoint and
copied to the model's tensors. Therefore, the tensors in the checkpoint and
in the model must have the same shape. If a model has been "thinned" and saved to
a checkpoint, then the checkpoint tensors are "smaller" than the ones defined by
the model. A "thinning recipe" is used to make changes to the model before copying
the tensors from the checkpoint.
In this case, the "thinning recipe" was missing.

51a7df35

Jun 30, 2018

Bug fix: add support for thinning the optimizer · b21f449b

Neta Zmora authored 6 years ago

You no longer need to use —momentum=0 when removing structures
dynamically.
The SGD momentum update (velocity) is dependent on the weights, which
PyTorch optimizers cache internally.  This caching is not a problem for
filter/channel removal (thinning) because although we dynamically
change the shapes of the weights tensors, we don’t change the weights
tensors themselves.
PyTorch’s SGD creates tensors to store the momentum updates, and these
tensors have the same shape as the weights tensors.  When we change the
weights tensors, we need to make the appropriate changes in the Optimizer,
or disable the momentum.
We added a new function - thinning.optimizer_thinning() - to do this.
This function is brittle as it is tested only on optim.SGD and relies on the
internal representation of the SGD optimizer, which can change w/o notice.
For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
Which also depend the shape of the weight tensors.
We needed to pass the Optimizer instance to Thinning policies
(ChannelRemover, FilterRemover) via the callbacks, which required us
to change the callback interface.
In the future we plan a bigger change to the callback API, to allow
passing of arbitrary context from the training environment to Distiller.

Also in this commit:
* compress_classifier.py had special handling for resnet layer-removal, which
is used in examples/ssl/ssl_4D-removal_training.yaml.
This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
Removing support for layer-removal.
* Added to the tests invocation of forward and backward passes over a model.
This tests more of the real flows, which use the optimizer and construct
gradient tensors.
* Added a test of a special case of convolution filter-pruning which occurs
when the next layer is fully-connected (linear)

b21f449b

Jun 10, 2018

Thinning (#6) · 42650340

Neta Zmora authored 6 years ago

* Large update containing new thinning algorithm.

Thinning a model is the process of taking a dense network architecture with a parameter model that
has structure-sparsity (filters or channels) in the weights tensors of convolution layers, and making changes in the network architecture and parameters, in order to completely remove the structures.
The new architecture is smaller (condensed), with less channels and filters in some of the convolution layers. Linear and BatchNormalization layers are also adjusted as required.

To perform thinning, we create a SummaryGraph (‘sgraph’) of our model. We use the ‘sgraph’ to infer the
data-dependency between the modules in the PyTorch network. This entire process is not trivial and will be
documented in a different place.

Large refactoring of SummaryGraph to support the new thinning requirement for traversing successors and predecessors.
- Operations (ops) are now stored in a dictionary, so that they can be accessed quickly by name.
- Refactor Operation construction code
- Added support for search a node’s predecessors and successors. You can search for all predecessors/successors by depth, or by type.
- create_png now supports an option to display the parameter nodes

Updated schedules with new thinning syntax.

* Thinning: support iterative thinning of models

THere's a caveat with this commit: when using this code you will
need to train with SGD momentum=0.
The momentum update is dependent on the weights, and because we
dynamically change the weights shapes, we need to either make the
apporpriate changes in the Optimizer, or disable the momentum.
For now, we disable the momentum

* Thinning: move the application of FilterRemover to on_minibatch_begin

* Thinning: fix syntax error

* Word-level language model compression

Added an implementation of Baidu’s RNN pruning scheme:
Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017).
Exploring Sparsity in Recurrent Neural Networks.
(https://arxiv.org/abs/1704.05119)

Added an example of word-level language model compression.
The language model is based on PyTorch’s example:
https://github.com/pytorch/examples/tree/master/word_language_model

Added an AGP pruning schedule and RNN pruning schedule to demonstrate
compression of the language model.

* thinning: remove dead code

* remove resnet18 filter pruning since the scheduler script is incomplete

* thinning: fix indentation error

* thinning: remove dead code

* thinning: updated resnet20-CIFAR filter-removsal reference checkpoints

* thinning: updated resnet20-CIFAR filter-removal reference schedules

These are for use with the new thinning scheudle algorithm

Unverified

42650340

May 01, 2018
- fix path to dataset in the SSL channels example · d9cc36fd
  Neta Zmora authored 6 years ago
  
  d9cc36fd
Apr 28, 2018
- SSL 4D schedule: fix path to dataset and the number of epochs · 958b361f
  Neta Zmora authored 6 years ago
  
  958b361f
Apr 24, 2018
- first commit · 6eef69b5
  Neta Zmora authored 7 years ago
  
  6eef69b5