- Nov 08, 2018
-
-
Haim Barad authored
* Updated stats computation - fixes issues with validation stats * Clarification of output (docs) * Update * Moved validation stats to separate function
-
Guy Jacob authored
-
- Nov 06, 2018
-
-
Neta Zmora authored
By default, when we create a model we wrap it with DataParallel to benefit from data-parallelism across GPUs (mainly for convolution layers). But sometimes we don't want the sample application to do this: for example when we receive a model that was trained serially. This commit adds a new argument to the application to prevent the use of DataParallel.
-
Haim Barad authored
* Fixed validation stats and added new summary stats * Trimmed some comments. * Improved figure for documentation * Minor updates
-
- Nov 05, 2018
-
-
Neta Zmora authored
Added an implementation of: Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen. NIPS 2016, https://arxiv.org/abs/1608.04493. - Added SplicingPruner: A pruner that both prunes and splices connections. - Included an example schedule on ResNet20 CIFAR. - New features for compress_classifier.py: 1. Added the "--masks-sparsity" which, when enabled, logs the sparsity of the weight masks during training. 2. Added a new command-line argument to report the top N best accuracy scores, instead of just the highest score. This is sometimes useful when pruning a pre-trained model, that has the best Top1 accuracy in the first few pruning epochs. - New features for PruningPolicy: 1. The pruning policy can use two copies of the weights: one is used during the forward-pass, the other during the backward pass. This is controlled by the “mask_on_forward_only” argument. 2. If we enable “mask_on_forward_only”, we probably want to permanently apply the mask at some point (usually once the pruning phase is done). This is controlled by the “keep_mask” argument. 3. We introduce a first implementation of scheduling at the training-iteration granularity (i.e. at the mini-batch granularity). Until now we could schedule pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency” (disable by setting to zero). Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler. Need to reexamine this in the future.
-
- Nov 01, 2018
-
-
Guy Jacob authored
* Added command line arguments for this and other post-training quantization settings in image classification sample.
-
- Oct 22, 2018
-
-
Neta Zmora authored
Activation statistics can be leveraged to make pruning and quantization decisions, and so We added support to collect these data. - Two types of activation statistics are supported: summary statistics, and detailed records per activation. Currently we support the following summaries: - Average activation sparsity, per layer - Average L1-norm for each activation channel, per layer - Average sparsity for each activation channel, per layer For the detailed records we collect some statistics per activation and store it in a record. Using this collection method generates more detailed data, but consumes more time, so Beware. * You can collect activation data for the different training phases: training/validation/test. * You can access the data directly from each module that you chose to collect stats for. * You can also create an Excel workbook with the stats. To demonstrate use of activation collection we added a sample schedule which prunes weight filters by the activation APoZ according to: "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures", Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016 https://arxiv.org/abs/1607.03250 We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning, and specifically we separated the AGP schedule from the filter pruning criterion. We added examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner), random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), and filter L1-norm (L1RankedStructureParameterPruner)
-
- Sep 20, 2018
-
-
Neta Zmora authored
-
- Sep 16, 2018
-
-
Neta Zmora authored
* Clean up PyTorch 0.3 compatibility code We don't need this anymore and PyTorch 1.0 is just around the corner. * explicitly place the inputs tensor on the GPU(s)
-
- Sep 03, 2018
-
-
Guy Jacob authored
* Implemented as a Policy * Integrated in image classification sample * Updated docs and README
-
- Aug 09, 2018
-
-
Guy Jacob authored
* Instead of a single additive value (which so far represented only the regularizer loss), callbacks return a new overall loss * Policy callbacks also return the individual loss components used to calculate the new overall loss. * Add boolean flag to the Scheduler's callback so applications can choose if they want to get individual loss components, or just the new overall loss * In compress_classifier.py, log the individual loss components * Add test for the loss-from-callback flow
-
- Jul 31, 2018
-
-
Haim Barad authored
Enabling Early Exit strategy in image classifier example
-
- Jul 25, 2018
-
-
Neta Zmora authored
We are using this file for more and more use-cases and we need to keep it readable and clean. I've tried to move code that is not in the main control-path to specific functions.
-
Neta Zmora authored
Due to the various uses of these functions, we need to pass an ever growing number of arguments to these functions and the API is becoing bloated and unstable. Also added the option to log the confusion matrix.
-
- Jul 22, 2018
-
-
Gal Novik authored
* Adding PACT quantization method * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself * Updated documentation and tests
-
- Jul 17, 2018
-
-
Guy Jacob authored
-
- Jul 13, 2018
-
-
Neta Zmora authored
This is a merge of the ADC branch and master. ADC (using a DDPG RL agent to compress image classifiers) is still WiP and requires An unreleased version of Coach (https://github.com/NervanaSystems/coach). Small features in this commit: -Added model_find_module() - find module object given its name - Add channel ranking and pruning: pruning/ranked_structures_pruner.py - Add a CIFAR10 VGG16 model: models/cifar10/vgg_cifar.py - Thinning: change the level of some log messages – some of the messages were moved to ‘debug’ level because they are not usually interesting. - Add a function to print nicely formatted integers - distiller/utils.py - Sensitivity analysis for channels-removal - compress_classifier.py – handle keyboard interrupts - compress_classifier.py – fix re-raise of exceptions, so they maintain call-stack -Added tests: -- test_summarygraph.py: test_simplenet() - Added a regression test to target a bug that occurs when taking the predecessor of the first node in a graph -- test_ranking.py - test_ch_ranking, test_ranked_channel_pruning -- test_model_summary.py - test_png_generation, test_summary (sparsity/ compute/model/modules) - Bug fixes in this commit: -- Thinning bug fix: handle zero-sized 'indices' tensor During the thinning process, the 'indices' tensor can become zero-sized, and will have an undefiend length. Therefore, we need to check for this situation when assessing the number of elements in 'indices' -- Language model: adjust main.py to new distiller.model_summary API
-
- Jul 11, 2018
-
-
Neta Zmora authored
Remove the complicated logic trying to handle data-parallel models as serially-processed models, and vice versa. *Function distiller.utils.make_non_parallel_copy() does the heavy lifting of replacing all instances of nn.DataParallel in a model with instances of DoNothingModuleWrapper. The DoNothingModuleWrapper wrapper does nothing but forward to the wrapped module. This is a trick we use to transform a data-parallel model to a serial-processed model. *SummaryGraph uses a copy of the model after the model is processed by distiller.make_non_parallel_copy() which renders the model non-data-parallel. *The same goes for model_performance_summary() *Model inputs are explicitly placed on the Cuda device, since now all models are Executed on the CPU. Previously, if a model was not created using nn.DataParallel, then the model was not explicitly placed on the Cuda device. *The logic in distiller.CompressionScheduler that attempted to load a model parallel model and process it serially, or load a serial model and process it data-parallel, was removed. This removes a lot of fuzziness and makes the code more robust: we do not needlessly try to be heroes. * model summaries - remove pytorch 0.4 warning * create_model: remove redundant .cuda() call * Tests: support both parallel and serial tests
-
- Jun 30, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
You no longer need to use —momentum=0 when removing structures dynamically. The SGD momentum update (velocity) is dependent on the weights, which PyTorch optimizers cache internally. This caching is not a problem for filter/channel removal (thinning) because although we dynamically change the shapes of the weights tensors, we don’t change the weights tensors themselves. PyTorch’s SGD creates tensors to store the momentum updates, and these tensors have the same shape as the weights tensors. When we change the weights tensors, we need to make the appropriate changes in the Optimizer, or disable the momentum. We added a new function - thinning.optimizer_thinning() - to do this. This function is brittle as it is tested only on optim.SGD and relies on the internal representation of the SGD optimizer, which can change w/o notice. For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq'] Which also depend the shape of the weight tensors. We needed to pass the Optimizer instance to Thinning policies (ChannelRemover, FilterRemover) via the callbacks, which required us to change the callback interface. In the future we plan a bigger change to the callback API, to allow passing of arbitrary context from the training environment to Distiller. Also in this commit: * compress_classifier.py had special handling for resnet layer-removal, which is used in examples/ssl/ssl_4D-removal_training.yaml. This is a brittle and ugly hack. Until we have a more elegant solution, I’m Removing support for layer-removal. * Added to the tests invocation of forward and backward passes over a model. This tests more of the real flows, which use the optimizer and construct gradient tensors. * Added a test of a special case of convolution filter-pruning which occurs when the next layer is fully-connected (linear)
-
- Jun 21, 2018
-
-
Guy Jacob authored
-
- Jun 19, 2018
-
-
Guy Jacob authored
* Modify 'create_png' to use the correct data structures (dicts instead lists, etc.) * Handle case where an op was called not from a module. This relates to: * ONNX->"User-Friendly" name conversion to account for cases where * Detection of existing op with same name In both cases use the ONNX op type in addition to the op name * Return an "empty" shape instead of None when ONNX couldn't infer a parameter's shape * Expose option of PNG summary with parameters to user
-
- May 17, 2018
-
-
Neta Zmora authored
The latest changes to the logger caused the CI tests to fail, because test assumes that the logging.conf file is present in the same directory as the sample application script. The sample application used cwd() instead, and did not find the log configuration file.
-
- May 16, 2018
-
-
Neta Zmora authored
Soon we will be reusing this function in other sample apps, so let's move it to app_utils.
-
Neta Zmora authored
The 'master' branch now uses PyTorch 0.4, which has API changes that are not backward compatible with PyTorch 0.3. After we've upgraded Distiller's internal implementation to be compatible with PyTorch 0.4, we've added a check that you are using the correct PyTorch version. Note that we only perform this check in the sample image classifier compression application.
-
Neta Zmora authored
Eventually we will want to use this code in other sample applications, so let's move the logger configuration code to a separate function. There's a bit of ugly hacking in this current implementation because I've added variable members to logging.logger. These are actaully config-once variables that convey the logging directory and filename. I did not want to add more names to the global namespace, so I hacked a temporary solution in which logging.logger is acts as a conveyor and private namespace. We'll get that cleaned up as we do more refactoring.
-
Neta Zmora authored
This is a niche feature, which lets you print the names of the modules in a model, from the command-line. Non-leaf nodes are excluded from this list. Other caveats are documented in the code.
-
Neta Zmora authored
Data parallel models may execute faster on multiple GPUs, but rendering them creates visually complex and illegible graphs. Therefore, when creating models for a PNG summary, we opt to use non-parallel models.
-
Neta Zmora authored
-
- May 14, 2018
-
-
Guy Jacob authored
-
- Apr 24, 2018
-
-
Neta Zmora authored
-