- Oct 22, 2018
-
-
Neta Zmora authored
Activation statistics can be leveraged to make pruning and quantization decisions, and so We added support to collect these data. - Two types of activation statistics are supported: summary statistics, and detailed records per activation. Currently we support the following summaries: - Average activation sparsity, per layer - Average L1-norm for each activation channel, per layer - Average sparsity for each activation channel, per layer For the detailed records we collect some statistics per activation and store it in a record. Using this collection method generates more detailed data, but consumes more time, so Beware. * You can collect activation data for the different training phases: training/validation/test. * You can access the data directly from each module that you chose to collect stats for. * You can also create an Excel workbook with the stats. To demonstrate use of activation collection we added a sample schedule which prunes weight filters by the activation APoZ according to: "Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures", Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016 https://arxiv.org/abs/1607.03250 We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning, and specifically we separated the AGP schedule from the filter pruning criterion. We added examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner), random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), and filter L1-norm (L1RankedStructureParameterPruner)
-
Neta Zmora authored
-
- Oct 18, 2018
-
-
Neta Zmora authored
ONNX export in PyTorch doesn't know how to handle DataParallel layers, so we need to make sure that we remove all instances of nn.DataParallel from the model before exporting it. The previous ONNX implementation forgot to deal with the case of DataParallel layers that do not wrap the entire model (as in VGG, where only the feature-extractor layers are data-parallel).
-
Neta Zmora authored
We should only add softmax when we explicitly require it (as when exporting to ONNX), because CrossEntropyLoss implicitly computes softmax on the logits it receives as input. This cade was left there by mistake and should have never been pushed to git.
-
- Oct 13, 2018
-
-
Neta Zmora authored
Using the automatic gradual pruning for structured-pruning is very simple to use and produces good results. It is implemented in Google's TensorFlow framework.
-
Neta Zmora authored
When running inference in ONNX, we often want to add a softmax layer to TorchVision's models.
-
- Oct 11, 2018
-
-
Neta Zmora authored
When using a schedule with epochs that have nothing scheduled for them, apply_mask() is not invoked at the end of mini-batches, and pruned weights might be unmasked by the optimizer weight updates. See explanation in issue #53 discussion
-
- Oct 04, 2018
-
-
Neta Zmora authored
Temporary fix for dependency on distiller class hierarchy when serializing a model that contains a thinning recipe.
-
- Oct 03, 2018
-
-
Neta Zmora authored
Latest versions of Jupyter notebooks have a different syntax for launching the server such that it listens on oll network interfaces (this is useful if you are running the Jupyter server on one machine, and connect to it from a browser on a different machine). So: jupyter-notebook --ip=* --no-browser is replaced by: jupyter-notebook --ip=0.0.0.0 --no-browser
-
Neta Zmora authored
Remove function to_var() which is not used by any code.
-
Neta Zmora authored
We need AverageValueMeter's support for numpy arrays.
-
Neta Zmora authored
Showing various details about the performance of ResNet50
-
Neta Zmora authored
Also show fitting of the histograms to the Gaussian and Laplacian distributions.
- Oct 01, 2018
-
-
Neta Zmora authored
-
- Sep 29, 2018
-
-
Neta Zmora authored
Somehow 4 copies of the same test were pasted into this file: removed 3 instances.
-
- Sep 26, 2018
-
-
Neta Zmora authored
* Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning. - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance. - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups). - When tested on ResNet 20 GSS did not show any improvement over SSL * Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks) * Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR * Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.
-
- Sep 21, 2018
-
-
Yi-Syuan Chen authored
-
- Sep 20, 2018
-
-
Neta Zmora authored
clean-up a bit of code
-
Neta Zmora authored
-
Neta Zmora authored
In this experiment we increase the regulization_strength of some of the channel regularization terms. We want to increase the compute compression, while allowing some reduction in the accuracy performance.
-
Neta Zmora authored
This schedule demonstrates low-rate pruning (26% sparsity) acting as a regularizer to reduce the generalization error of ResNet50 using the ImageNet dataset. We improve the ResNet50 Top1 test error by 0.4% (23.462 vs 23.85). Top5 is improved as well: 6.82 error vs. 7.13 error in the baseline
-
- Sep 16, 2018
-
-
Neta Zmora authored
* Clean up PyTorch 0.3 compatibility code We don't need this anymore and PyTorch 1.0 is just around the corner. * explicitly place the inputs tensor on the GPU(s)
-
Neta Zmora authored
* A temporary fix for issue 36 The thinning code assumes that the sgraph it is using is not data-parallel, because it (currently) accesses the layer-name keys using a "normalized" name ("module." is removed). The bug is that in thinning.py#L73 we create a data_parallel=True model; and then give it to sgraph. But in other places thinning code uses "normalized" keys. For example in thinning.py#L264. The temporary fix configures data_parallel=False in thinning.py#L73. A long term solution should have SummaryGraph know how to handle both parallel and not-parallel models. This can be done by having SummaryGraph convert layer-names it receives in the API to data_parallel=False using normalize_layer_name. When returning results, use the de-normalized format. * Fix the documentation error from issue 36 * Move some logs to debug and show in logging.conf how to enable DEBUG logs.
-
Neta Zmora authored
-
- Sep 03, 2018
-
-
Guy Jacob authored
* Implemented as a Policy * Integrated in image classification sample * Updated docs and README
-
- Aug 29, 2018
-
-
Neta Zmora authored
-
- Aug 27, 2018
-
-
Neta Zmora authored
Sometimes the gmin/gmax in group color-normalization ends up with a zero dimensional tensor, which needs to be accessed using .item()
-
Neta Zmora authored
Sometimes the gmin/gmax in group color-normalization ends up with a zero dimensional tensor, which needs to be accessed using .item()
-
Neta Zmora authored
-
- Aug 09, 2018
-
-
Guy Jacob authored
* Instead of a single additive value (which so far represented only the regularizer loss), callbacks return a new overall loss * Policy callbacks also return the individual loss components used to calculate the new overall loss. * Add boolean flag to the Scheduler's callback so applications can choose if they want to get individual loss components, or just the new overall loss * In compress_classifier.py, log the individual loss components * Add test for the loss-from-callback flow
-
- Aug 07, 2018
-
-
Neta Zmora authored
* Fix bug: thresholding matrix cols should use dim=0 (issue #39) See issue #39 for a description of the bug from @vinutah. * thresholding test: fix device assignment
-
- Jul 31, 2018
-
-
Haim Barad authored
Enabling Early Exit strategy in image classifier example
-
- Jul 25, 2018
-
-
Neta Zmora authored
We are using this file for more and more use-cases and we need to keep it readable and clean. I've tried to move code that is not in the main control-path to specific functions.
-
Neta Zmora authored
Aslo added a script to analyze model-spaces.
-
Neta Zmora authored
This is a convinence function used by customers of the scheduler, and might change location in the future.
-
Neta Zmora authored
Due to the various uses of these functions, we need to pass an ever growing number of arguments to these functions and the API is becoing bloated and unstable. Also added the option to log the confusion matrix.
-
- Jul 22, 2018
-
-
Gal Novik authored
-
Neta Zmora authored
-
Neta Zmora authored
-