- Aug 06, 2019
-
-
Neta Zmora authored
*An implementation of AMC (the previous implementation code has moved to a new location under /distiller/examples/auto_compression/amc. AMC is aligned with the ‘master’ branch of Coach. *compress_classifier.py is refactored. The base code moved to /distiller/apputils/image_classifier.py. Further refactoring will follow. We want to provide a simple and small API to the basic features of a classifier-compression application. This will help applications that want to use the make features of a classifier-compression application, without the standard training regiment. AMC is one example of a stand-alone application that needs to leverage the capabilities of a classifier-compression application, but is currently coupled to `compress_classifier.py`. `multi-finetune.py` is another example. * ranked_structures_pruner.py: ** Added support for grouping channels/filters Sometimes we want to prune a group of structures: e.g. groups of 8-channels. This feature does not force the groups to be adjacent, so it is more like a set of structures. E.g. in the case of pruning channels from a 64-channels convolution, grouped by 8 channels, we will prune exactly one of 0/8/16/24/32/40/48/56 channels. I.e. always a multiple of 8-channels, excluding the set of all 64 channels. ** Added FMReconstructionChannelPruner – this is channel pruning using L1-magnitude to rank and select channels to remove, and feature-map reconstruction to improve the resilience to the pruning. * Added a script to run multiple instances of an experiment, in different processes: examples/classifier_compression/multi-run.py * Set the seed value even when not specified by the command-line arguments, so that we can try and recreate the session. * Added pruning ranking noise - Ranking noise introduces Gaussian noise when ranking channels/filters using Lp-norm. The noise is introduced using the epsilon-greedy methodology, where ranking using exact Lp-norm is considered greedy. * Added configurable rounding of pruning level: choose whether to Round up/down when rounding the number of structures to prune (rounding is always to an integer).
-
- Apr 01, 2019
-
-
Bar authored
Load optimizer from checkpoint (BREAKING - see details) (#182) * Fixes issues #70, #145 and replaces PR #74 * checkpoint.py * save_checkpoint will now save the optimizer type in addition to its state * load_checkpoint will now instantiate an optimizer based on the saved type and load its state * config.py: file/dict_config now accept the resumed epoch to pass to LR schedulers * policy.py: LRPolicy now passes the current epoch to the LR scheduler * Classifier compression sample * New flag '--resume-from' for properly resuming a saved training session, inc. optimizer state and epoch # * Flag '--reset-optimizer' added to allow discarding of a loaded optimizer. * BREAKING: * Previous flag '--resume' is deprecated and is mapped to '--resume-from' + '--reset-optimizer'. * But, old resuming behavior had an inconsistency where the epoch count would continue from the saved epoch, but the LR scheduler was setup as if we were starting from epoch 0. * Using '--resume-from' + '--reset-optimizer' now will simply RESET the epoch count to 0 for the whole environment. * This means that scheduling configurations (in YAML or code) which assumed use of '--resume' might need to be changed to reflect the fact that the epoch count now starts from 0 * All relevant YAML files under 'examples' modified to reflect this change * Initial support for ReduceLROnPlateau (#161): * Allow passing **kwargs to policies via the scheduler * Image classification now passes the validation loss to the scheduler, to be used yo ReduceLROnPlateau * The current implementation is experimental and subject to change
-
- Sep 26, 2018
-
-
Neta Zmora authored
* Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning. - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance. - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups). - When tested on ResNet 20 GSS did not show any improvement over SSL * Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks) * Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR * Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.
-
- Sep 20, 2018
-
-
Neta Zmora authored
In this experiment we increase the regulization_strength of some of the channel regularization terms. We want to increase the compute compression, while allowing some reduction in the accuracy performance.
-
- Jul 22, 2018
-
-
Neta Zmora authored
-
- Jul 09, 2018
-
-
Neta Zmora authored
The checkpoint file: examples/ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar did not contain the "thinning recipe" while the weight tensor stored within the checkpoint file have already been shrunk/thinned and this caused a mismatch. PyTorch models are defined in code. This includes the network architecture and connectivity (which layers are used and what is the forward path), but also the sizes for the parameter tensors and input/outputs. When the model is created the parameter tensors are also created, as defined or inferred from the code. When a checkpoint is loaded, they parameter tensors are read from the checkpoint and copied to the model's tensors. Therefore, the tensors in the checkpoint and in the model must have the same shape. If a model has been "thinned" and saved to a checkpoint, then the checkpoint tensors are "smaller" than the ones defined by the model. A "thinning recipe" is used to make changes to the model before copying the tensors from the checkpoint. In this case, the "thinning recipe" was missing.
-
- Jun 30, 2018
-
-
Neta Zmora authored
You no longer need to use —momentum=0 when removing structures dynamically. The SGD momentum update (velocity) is dependent on the weights, which PyTorch optimizers cache internally. This caching is not a problem for filter/channel removal (thinning) because although we dynamically change the shapes of the weights tensors, we don’t change the weights tensors themselves. PyTorch’s SGD creates tensors to store the momentum updates, and these tensors have the same shape as the weights tensors. When we change the weights tensors, we need to make the appropriate changes in the Optimizer, or disable the momentum. We added a new function - thinning.optimizer_thinning() - to do this. This function is brittle as it is tested only on optim.SGD and relies on the internal representation of the SGD optimizer, which can change w/o notice. For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq'] Which also depend the shape of the weight tensors. We needed to pass the Optimizer instance to Thinning policies (ChannelRemover, FilterRemover) via the callbacks, which required us to change the callback interface. In the future we plan a bigger change to the callback API, to allow passing of arbitrary context from the training environment to Distiller. Also in this commit: * compress_classifier.py had special handling for resnet layer-removal, which is used in examples/ssl/ssl_4D-removal_training.yaml. This is a brittle and ugly hack. Until we have a more elegant solution, I’m Removing support for layer-removal. * Added to the tests invocation of forward and backward passes over a model. This tests more of the real flows, which use the optimizer and construct gradient tensors. * Added a test of a special case of convolution filter-pruning which occurs when the next layer is fully-connected (linear)
-
- Jun 10, 2018
-
-
Neta Zmora authored
* Large update containing new thinning algorithm. Thinning a model is the process of taking a dense network architecture with a parameter model that has structure-sparsity (filters or channels) in the weights tensors of convolution layers, and making changes in the network architecture and parameters, in order to completely remove the structures. The new architecture is smaller (condensed), with less channels and filters in some of the convolution layers. Linear and BatchNormalization layers are also adjusted as required. To perform thinning, we create a SummaryGraph (‘sgraph’) of our model. We use the ‘sgraph’ to infer the data-dependency between the modules in the PyTorch network. This entire process is not trivial and will be documented in a different place. Large refactoring of SummaryGraph to support the new thinning requirement for traversing successors and predecessors. - Operations (ops) are now stored in a dictionary, so that they can be accessed quickly by name. - Refactor Operation construction code - Added support for search a node’s predecessors and successors. You can search for all predecessors/successors by depth, or by type. - create_png now supports an option to display the parameter nodes Updated schedules with new thinning syntax. * Thinning: support iterative thinning of models THere's a caveat with this commit: when using this code you will need to train with SGD momentum=0. The momentum update is dependent on the weights, and because we dynamically change the weights shapes, we need to either make the apporpriate changes in the Optimizer, or disable the momentum. For now, we disable the momentum * Thinning: move the application of FilterRemover to on_minibatch_begin * Thinning: fix syntax error * Word-level language model compression Added an implementation of Baidu’s RNN pruning scheme: Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017). Exploring Sparsity in Recurrent Neural Networks. (https://arxiv.org/abs/1704.05119) Added an example of word-level language model compression. The language model is based on PyTorch’s example: https://github.com/pytorch/examples/tree/master/word_language_model Added an AGP pruning schedule and RNN pruning schedule to demonstrate compression of the language model. * thinning: remove dead code * remove resnet18 filter pruning since the scheduler script is incomplete * thinning: fix indentation error * thinning: remove dead code * thinning: updated resnet20-CIFAR filter-removsal reference checkpoints * thinning: updated resnet20-CIFAR filter-removal reference schedules These are for use with the new thinning scheudle algorithm
-
- May 01, 2018
-
-
Neta Zmora authored
-
- Apr 28, 2018
-
-
Neta Zmora authored
-
- Apr 24, 2018
-
-
Neta Zmora authored
-