- May 29, 2019
-
-
Neta Zmora authored
Also added a simple network model for MNIST, under distiller/models/mnist.
-
- May 26, 2019
-
-
Neta Zmora authored
Added set_seed() to Distiller and added support for seeding the PRNG when setting --deterministic mode (prior to this change, the seed is always set to zero when running in deterministic mode. The PRNGs of Pytorch (CPU & Cuda devices), numpy and Python are set. Added support for ```--seed``` to classifier_compression.py.
-
- May 21, 2019
-
-
Bar authored
Function log_execution_env_state is used to gather information about the execution environment and store this together with the experiment log. Recently we've added saving the compression schedule YAML file in the same logs directory. This commit expands the log_execution_env_state interface to accept a list of paths to arbitrary files that may contribute to the experiment configuration and that you (the experiment owner) deem important for recreating the experiment. In the sample classifier_compression.py app, we now store both the compression schedule YAML file and quantization statistics collateral file (qe_stats_file).
-
- May 16, 2019
-
-
Neta Zmora authored
The previous PR merge introduced a couple of small errors when using the --summary flag.
-
Bar authored
Introduced a new utility function to export image-classifiers to ONNX: export_img_classifier_to_onnx. The functionality is not new, just refactored. In the sample application compress_classifier.py added --export-onnx as a stand-alone cmd-line flag for specifically exporting ONNX models. This new flag can take an optional argument which is used to name the exported onnx model file. The option to export models was removed from the –summary argument. Now we allow multiple --summary options be called together. Added a basic test for exporting ONNX.
-
- May 15, 2019
-
-
Guy Jacob authored
Added a collector for activation histograms (sub-class of ActivationStatsCollector). It is stats-based, meaning it requires pre-computed min/max stats per tensor. This is done in order to prevent the need to save all of the activation tensors throughout the run. The stats are expected in the format generated by QuantCalibrationStatsCollector. Details: * Implemented ActivationHistogramsCollector * Added Jupyter notebook showcasing activation histograms * Implemented helper function that performs the stats collection pass and histograms pass in one go * Also added separate helper function just for quantization stats collection * Integrated in image classification sample * data_loaders.py: Added option to have a fixed subset throughout within the same session. Using it to keep the same subset between the stats collection and histograms collection phases. * Other changes: * Calling assign_layer_fq_names in base-class of collectors. We do this since the collectors, as implemented so far, assume this is done. So makes sense to just do it in the base class instead of expecting the user to do it. * Enforcing a non-parallel model for quantization stats and histograms collectors * Jupyter notebooks - add utility function to enable loggers in notebooks. This allows us to see any logging done by Distiller APIs called from notebooks.
-
- May 14, 2019
-
-
Bar authored
-
- Apr 18, 2019
-
-
Bar authored
Also: * Single worker limitation not needed anymore, been fixed in PyTorch since v0.4.0 (https://github.com/pytorch/pytorch/pull/4640) * compress_classifier.py: If run in evaluation mode (--eval), enable deterministic mode. * Call utils.set_deterministic at data loaders creation if deterministic argument is set (don't assume user calls it outside) * Disable CUDNN benchmark mode in utils.set_deterministic (https://pytorch.org/docs/stable/notes/randomness.html#cudnn)
-
- Apr 11, 2019
-
-
Guy Jacob authored
* Replace the optional 'best_top1' parameter with a generic optional dict which the caller can populate as needed. * Saved in the checkpoint under the key 'extras'
-
- Apr 08, 2019
-
-
Neta Zmora authored
Add finer control over the pruning logic, to accommodate more pruning use-cases. The full description of the new logic is available in the updated [documentation of the CompressionScheduler](https://nervanasystems.github.io/distiller/schedule.html#pruning-fine-control), which is also part of this PR. In this PR: * Added a new callback to the CompressionScheduler: compression_scheduler.before_parameter_optimization which is invoked after the gradients are are computed, but before the weights are updated by the optimizer. * We provide an option to mask the gradients, before the weights are updated by the optimizer. We register to the parameter backward hook in order to mask the gradients. This gives us finer control over the parameter updates. * Added several DropFilter schedules. DropFilter is a method to regularize networks, and it can also be used to "prepare" a network for permanent filter pruning. *Added documentation of pruning fine-control
-
- Apr 01, 2019
-
-
Bar authored
Load optimizer from checkpoint (BREAKING - see details) (#182) * Fixes issues #70, #145 and replaces PR #74 * checkpoint.py * save_checkpoint will now save the optimizer type in addition to its state * load_checkpoint will now instantiate an optimizer based on the saved type and load its state * config.py: file/dict_config now accept the resumed epoch to pass to LR schedulers * policy.py: LRPolicy now passes the current epoch to the LR scheduler * Classifier compression sample * New flag '--resume-from' for properly resuming a saved training session, inc. optimizer state and epoch # * Flag '--reset-optimizer' added to allow discarding of a loaded optimizer. * BREAKING: * Previous flag '--resume' is deprecated and is mapped to '--resume-from' + '--reset-optimizer'. * But, old resuming behavior had an inconsistency where the epoch count would continue from the saved epoch, but the LR scheduler was setup as if we were starting from epoch 0. * Using '--resume-from' + '--reset-optimizer' now will simply RESET the epoch count to 0 for the whole environment. * This means that scheduling configurations (in YAML or code) which assumed use of '--resume' might need to be changed to reflect the fact that the epoch count now starts from 0 * All relevant YAML files under 'examples' modified to reflect this change * Initial support for ReduceLROnPlateau (#161): * Allow passing **kwargs to policies via the scheduler * Image classification now passes the validation loss to the scheduler, to be used yo ReduceLROnPlateau * The current implementation is experimental and subject to change
-
- Mar 17, 2019
-
-
Neta Zmora authored
In several places we hit an error state and exit using exit(), instead of raising a ValueError - fixed this.
-
- Mar 12, 2019
-
-
Bar authored
"Peformance" --> "Performance"
-
- Mar 06, 2019
-
-
Neta Zmora authored
A recent commit changed the sorting of the best performing training epochs to be based on the sparsity level of the model, then its Top1 and Top5 scores. When we create thinned models, the sparsity remains low (even zero), while the physical size of the network is smaller. This commit changes the sorting criteria to be based on the count of non-zero (NNZ) parameters. This captures both sparsity and parameter size objectives: - When sparsity is high, the number of NNZ params is low (params_nnz_cnt = sparsity * params_cnt). - When we remove structures (thinnning), the sparsity may remain constant, but the count of params (params_cnt) is lower, and therefore, once again params_nnz_cnt is lower. Therefore, params_nnz_cnt is a good proxy to capture a sparsity objective and/or a thinning objective.
-
- Mar 03, 2019
-
-
Neta Zmora authored
Based on a commit and ideas from @barrh: https://github.com/NervanaSystems/distiller/pull/150/commits/1623db3cdc3a95ab620e2dc6863cff23a91087bd The sample application compress_classifier.py logs details about the best performing epoch(s) and stores the best epoch in a checkpoint file named ```best.pth.tar``` by default (if you use the ```--name``` application argument, the checkpoint name will be prefixed by ```best```). Until this fix, the performance of a model was judged solely on its Top1 accuracy. This can be a problem when performing gradual pruning of a pre-trained model, because many times a model's Top1 accuracy increases with light pruning and this is registered as the best performing training epoch. However, we are really interested in the best performing trained model _after_ the pruning phase is done. Even during training, we may be interested in the checkpoint of the best performing model with the highest sparsity. This fix stores a list of the performance results from all the trained epochs so far. This list is sorted using a hierarchical key: (sparsity, top1, top5, epoch), so that the list is first sorted by sparsity, then top1, followed by top5 and epoch. But what if you want to sort using a different metric? For example, when quantizing you may want to score the best performance by the total number of bits used to represent the model parameters and feature-maps. In such a case you may want to replace ```sparsity``` by this new metric. Because this is a sample application, we don't load it with all possible control logic, and anyone can make local changes to this logic. To keep your code separated from the main application logic, we plan to refactor the application code sometime in the next few months.
-
Neta Zmora authored
Release 0.3 broke the expots to PNG and ONNX and this is the fix.
-
- Feb 28, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
-
- Feb 26, 2019
-
-
Lev Zlotnik authored
Not backward compatible - re-installation is required * Fixes for PyTorch==1.0.0 * Refactoring folder structure * Update installation section in docs
-
- Feb 17, 2019
-
-
Neta Zmora authored
A small change to support ranking weight filters by the mean mean-value of the feature-map channels. Mean mean-value refers to computing the average value (across many input images) of the mean-value of each channel.
-
- Feb 14, 2019
-
-
Bar authored
Modified log_execution_env_state() to store configuration file in the output directory, under 'configs' sub-directory it creates. At this time, the only configuration file is passed via args.compress
-
Neta Zmora authored
To use automated compression you need to install several optional packages which are not required for other use-cases. This fix hides the import requirements for users who do not want to install the extra packages.
-
- Feb 13, 2019
-
-
Neta Zmora authored
Merging the 'amc' branch with 'master'. This updates the automated compression code in 'master', and adds a greedy filter-pruning algorithm.
-
- Feb 11, 2019
-
-
Guy Jacob authored
Summary of changes: (1) Post-train quantization based on pre-collected statistics (2) Quantized concat, element-wise addition / multiplication and embeddings (3) Move post-train quantization command line args out of sample code (4) Configure post-train quantization from YAML for more fine-grained control (See PR #136 for more detailed changes descriptions)
-
- Feb 10, 2019
-
-
Guy Jacob authored
* For CIFAR-10 / ImageNet only * Refactor data_loaders.py, reduce code duplication * Implemented custom sampler * Integrated in image classification sample * Since we now shuffle the test set, had to update expected results in 2 full_flow_tests that do evaluation
-
- Jan 31, 2019
-
-
Neta Zmora authored
-
- Jan 16, 2019
-
-
Bar authored
* Support for multi-phase activations logging Enable logging activation both durning training and validation at the same session. * Refactoring: Move parser to its own file * Parser is moved from compress_classifier into its own file. * Torch version check is moved to precede main() call. * Move main definition to the top of the file. * Modify parser choices to case-insensitive
-
- Jan 15, 2019
-
-
Neta Zmora authored
Fix a mismatch between the location of the model and the computation.
-
- Jan 13, 2019
-
-
Neta Zmora authored
-
- Jan 10, 2019
-
-
Gal Novik authored
In compress_classifier.py we added a new application argument: --cpu which you can use to force compute (training/inference) to run on the CPU when you invoke compress_classifier.py on a machine which has Nvidia GPUs. If your machine lacks Nvidia GPUs, then the compute will now run on the CPU (and you do not need the new flag). Caveat: we did not fully test the CPU support for the code in the Jupyter notebooks. If you find a bug, we apologize and appreciate your feedback.
-
- Dec 19, 2018
-
-
Neta Zmora authored
If compression_scheduler==None, then we need to set the value of losses[OVERALL_LOSS_KEY] (so it is the same as losses[OBJECTIVE_LOSS_KEY]). This was overlooked.
-
- Dec 16, 2018
-
-
Taras Sereda authored
-
- Dec 14, 2018
-
-
Neta Zmora authored
Added notebook for visualizing the discovery of compressed networks. Added one-epoch fine-tuning at the end of every episode, which is required for very sensitive models like Plain20.
-
- Dec 11, 2018
-
-
Haim Barad authored
Revert back to Pytorch 0.4.0. Also fixed some numpy calls (for statistics) that needed to be moved back to CPU.
-
Yi-Syuan Chen authored
-
- Dec 06, 2018
-
-
Guangli Li authored
Update the examples of earlyexit arguments which were not consistent with descriptions
-
- Dec 04, 2018
-
-
Guy Jacob authored
* Asymmetric post-training quantization (only symmetric supported so until now) * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization * Per-channel quantization support in both training and post-training * Added tests and examples * Updated documentation
-
Neta Zmora authored
-
- Dec 01, 2018
-
-
Neta Zmora authored
This commit contains the main fix for issue #85. It contains a couple of changes to the YAML structure pruning API, with examples. I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels). New syntax for defining Structured AGP. I tried to make the syntax similar to fine-grained (i.e. element-wise) pruning. All you need to do is add: ```group_type: Filters```. ``` low_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters weights: [module.layer3.0.conv2.weight, module.layer3.0.downsample.0.weight, module.layer3.1.conv2.weight, module.layer3.2.conv2.weight] ``` If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```: ``` low_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters group_dependency: Leader weights: [module.layer3.0.conv2.weight, module.layer3.0.downsample.0.weight, module.layer3.1.conv2.weight, module.layer3.2.conv2.weight] ``` Retired the old ```reg_regims``` API for describing one-shot structured-pruning. The new YAML API is very similar to AGP structured-pruning, which is much better than before. The new API also allows us to describe data-dependencies when doing one-shot structure pruning, just like AGP structured-pruning. This commit also includes further code refactoring. Old API: ``` filter_pruner: class: 'L1RankedStructureParameterPruner' reg_regims: 'module.layer1.0.conv1.weight': [0.6, '3D'] 'module.layer1.1.conv1.weight': [0.6, '3D'] ``` New API: ``` filter_pruner: class: 'L1RankedStructureParameterPruner' group_type: Filters desired_sparsity: 0.6 weights: [ module.layer1.0.conv1.weight, module.layer1.1.conv1.weight] ``` thresholding.py – separate the generation of the binary_map from the pruning_mask so that we can cache the binary map and share it between several modules. pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based” sub-graph pruning dependencies. The concept is explained in issue #85 agp-pruning/resnet20_filters.schedule_agp.yaml agp-pruning/resnet20_filters.schedule_agp_2.yaml agp-pruning/resnet20_filters.schedule_agp_3.yaml network_trimming/resnet56_cifar_activation_apoz.yaml network_trimming/resnet56_cifar_activation_apoz_v2.yaml
-
- Nov 24, 2018
-
-
Guy Jacob authored
-