- Dec 06, 2018
-
-
Neta Zmora authored
- Moved the Language model and struct pruning tutorials from the Wiki to the HTML documentation. Love the ease of Wiki, but GitHub doesn't let Google crawl these pages, and users can't open PRs on Wiki pages. - Updated the pruning algorithms documentation
-
- Dec 04, 2018
-
-
Neta Zmora authored
-
Guy Jacob authored
* Asymmetric post-training quantization (only symmetric supported so until now) * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization * Per-channel quantization support in both training and post-training * Added tests and examples * Updated documentation
-
Neta Zmora authored
As @cattpku notes in issue #89, we should follow the [PyTorch ONNX limitations](https://pytorch.org/docs/stable/onnx.html#supported-operators) specification. For image classifiers this change is only syntactical.
-
Neta Zmora authored
- Dec 03, 2018
-
-
Neta Zmora authored
Top1: 75.52% (-0.63% from TorchVision dense ResNet50) Total sparsity: 82.6%
-
- Dec 02, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
In a previous commit, we chose to override the TorchVision ResNet models with our own modified versions of ResNet because collecting activation statistics after ReLU layers requires that each ReLU instance is only used once in the graph. However, overriding TorchVision models means that if someone wants to directly access the TorchVision model dictionary, he/she won't find the ResNet models in the dictionary keys. This is not very friendly and we should not break imported modules. This commit overrides the TorchVision ResNet models when accessed via the distiller.models.create_model API, w/o changing the TorchVision dictionary.
-
Guy Jacob authored
-
Guy Jacob authored
-
Neta Zmora authored
Replace "concat" layers with "element-wise addition" layers, to match the text.
-
- Dec 01, 2018
-
-
Neta Zmora authored
This commit contains the main fix for issue #85. It contains a couple of changes to the YAML structure pruning API, with examples. I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels). New syntax for defining Structured AGP. I tried to make the syntax similar to fine-grained (i.e. element-wise) pruning. All you need to do is add: ```group_type: Filters```. ``` low_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters weights: [module.layer3.0.conv2.weight, module.layer3.0.downsample.0.weight, module.layer3.1.conv2.weight, module.layer3.2.conv2.weight] ``` If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```: ``` low_pruner: class: L1RankedStructureParameterPruner_AGP initial_sparsity : 0.10 final_sparsity: 0.50 group_type: Filters group_dependency: Leader weights: [module.layer3.0.conv2.weight, module.layer3.0.downsample.0.weight, module.layer3.1.conv2.weight, module.layer3.2.conv2.weight] ``` Retired the old ```reg_regims``` API for describing one-shot structured-pruning. The new YAML API is very similar to AGP structured-pruning, which is much better than before. The new API also allows us to describe data-dependencies when doing one-shot structure pruning, just like AGP structured-pruning. This commit also includes further code refactoring. Old API: ``` filter_pruner: class: 'L1RankedStructureParameterPruner' reg_regims: 'module.layer1.0.conv1.weight': [0.6, '3D'] 'module.layer1.1.conv1.weight': [0.6, '3D'] ``` New API: ``` filter_pruner: class: 'L1RankedStructureParameterPruner' group_type: Filters desired_sparsity: 0.6 weights: [ module.layer1.0.conv1.weight, module.layer1.1.conv1.weight] ``` thresholding.py – separate the generation of the binary_map from the pruning_mask so that we can cache the binary map and share it between several modules. pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based” sub-graph pruning dependencies. The concept is explained in issue #85 agp-pruning/resnet20_filters.schedule_agp.yaml agp-pruning/resnet20_filters.schedule_agp_2.yaml agp-pruning/resnet20_filters.schedule_agp_3.yaml network_trimming/resnet56_cifar_activation_apoz.yaml network_trimming/resnet56_cifar_activation_apoz_v2.yaml
-
- Nov 29, 2018
-
-
Bar authored
Add support to models that contain named empty layers (i.e. have type: NoneType). This fix catches the AttributeError that is raised when a named NoneType layer is detected, and the layer is ignored.
-
- Nov 25, 2018
-
-
Neta Zmora authored
Thanks to @haim-barad for notifying us about this.
-
Neta Zmora authored
When ranking filters by APoZ, we need to change the sort order, to match that of module.apoz_channels.value(). This is the result of the previous bug fix.
-
Neta Zmora authored
Instead of returning the average-percentage-of-zeros, we returned the average-percentage-of-non-zeros. So inverted the results, and also multiplied by 100, because the name of the statistic has "percentage" not "fraction" (not very important, but still...)
-
Neta Zmora authored
-
- Nov 24, 2018
-
-
Guy Jacob authored
-
Neta Zmora authored
Thanks to Dan Alistarh for bringing this issue to my attention. The activations of Linear layers have shape (batch_size, output_size) and those of Convolution layers have shape (batch_size, num_channels, width, height) and this distinction in shape was not correctly handled. This commit also fixes sparsity computation for very large activations, as seen in VGG16, which leads to memory exhaustion. One solution is to use smaller batch sizes, but this commit uses a different solution, which counts zeros “manually”, and using less space. Also in this commit: - Added a “caveats” section to the documentation. - Added more tests.
-
- Nov 22, 2018
-
-
Neta Zmora authored
The super() method of the wrong subclass was used. In this case there were no practical implications, but we need to move to the less error-prone syntax of Python 3.x which does not require us to specify the subclass. I change the super() invocations in the entire file and ran two schedules for ResNet56 and actually got better results than previously. I don't think these results are related to this change, and I cannot explain them. Nontheless, I am committing these new results, because I also fixed the command-line parameters of resnet56_cifar_filter_rank_v2.yaml which had a copy & paste error in it.
-
Neta Zmora authored
* Fix issue #79 Change the default values so that the following scheduler meta-data keys are always defined: 'starting_epoch', 'ending_epoch', 'frequency' * compress_classifier.py: add a new argument Allow the specification, from the command line arguments, of the range of pruning levels scanned when doing sensitivity analysis * Add regression test for issue #79
-
- Nov 21, 2018
-
-
Neta Zmora authored
In our patched ResNet version, we change TorchVision's code so that ReLU module instances are used only once in a network.
-
Neta Zmora authored
When detecting a module that is used multiple times, stop execution and print an explanation to the user.v
-
Neta Zmora authored
Trying to simplify the code.
-
Neta Zmora authored
-
Neta Zmora authored
Add docs/conditional_computation.md which was accidentally left out of an earlier commit.
-
- Nov 20, 2018
-
-
Neta Zmora authored
* Bug fix: value of best_top1 stored in the checkpoint may be wrong If you invoke compress_clasifier.py with --num-best-scores=n with n>1, then the value of best_top1 stored in checkpoints is wrong.
-
Neta Zmora authored
When we resume from a checkpoint, we usually want to continue using the checkpoint’s masks. I say “usually” because I can see a situation where we want to prune a model and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the masks. This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training (https://arxiv.org/abs/1607.04381). But I didn’t want to add another argument to ```compress_classifier.py``` for the time being – so we ignore DSD. There are two possible situations when we resume a checkpoint that has a serialized ```CompressionScheduler``` with pruning masks: 1. We are planning on using a new ```CompressionScheduler``` that is defined in a schedule YAML file. In this case, we want to copy the masks from the serialized ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are constructing from the YAML file. This is one fix. 2. We are resuming a checkpoint, but without using a YAML schedule file. In this case we want to use the ```CompressionScheduler``` that we loaded from the checkpoint file. All this ```CompressionScheduler``` does is keep applying the masks as we train, so that we don’t lose them. This is the second fix. For DSD, we would need a new flag that would override using the ```CompressionScheduler``` that we load from the checkpoint.
-
- Nov 09, 2018
-
-
Neta Zmora authored
Another schedule or ResNet20 Filter-wise pruning for ResNet20, with 64.6% sparsity, 25.4% compute reduction and Top1 91.47% (vs. 91.78 basline).
-
- Nov 08, 2018
-
-
Neta Zmora authored
Top1 is 75.492 (on Epoch: 93) vs the published TorchVision baseline Top1: 76.15 (-0.66). Total sparsity: 80.05
-
Neta Zmora authored
Change the LR from 0.2 to 0.3, as was actually used to generate the results in remark.
-
Haim Barad authored
* Updated stats computation - fixes issues with validation stats * Clarification of output (docs) * Update * Moved validation stats to separate function
-
Neta Zmora authored
-
Guy Jacob authored
-
- Nov 07, 2018
-
-
Neta Zmora authored
Add missing files from previous commit
-
Neta Zmora authored
-
- Nov 06, 2018
-
-
Neta Zmora authored
We recently changed the signature of the on_minibatch_begin() callback from the scheduler (added 'meta') and the callback client in the thinning module was not updated.
-
Neta Zmora authored
By default, when we create a model we wrap it with DataParallel to benefit from data-parallelism across GPUs (mainly for convolution layers). But sometimes we don't want the sample application to do this: for example when we receive a model that was trained serially. This commit adds a new argument to the application to prevent the use of DataParallel.
-