- Oct 27, 2020
-
-
Guy Jacob authored
-
- Jan 15, 2020
-
-
Guy Jacob authored
(we use 8-bit values below, but this applies to any bit-width) * We use the notion of "full" and "restricted" quantized range for symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342) * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127] * Until now, when doing symmetric quantization we assumed a "full" range when saturating after quantization, but calculated the scale factor as if the range was restricted. This means we weren't making full utilization of the quantized range. * On the other hand, in some other implementations of quantization (e.g. TensorFlow), the "restricted" range is used. * So, we make it an option to use either the proper "full" range (q_min = -128) or "restricted" range (q_min = -127). * LinearQuantMode.SYMMETRIC now means the "full" range is used, and added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted" range. * Updated tests and documentation.
-
- Aug 08, 2019
-
-
Guy Jacob authored
-
- Aug 04, 2019
-
-
Guy Jacob authored
-
- Jul 10, 2019
-
-
Guy Jacob authored
* "Net-aware quantization" - using the term coined in https://arxiv.org/abs/1811.09886. (section 3.2.2). Refers to considering sequences of modules when quantizing. This isn't exactly layer fusion - we modify activation stats prior to setting quantization parameters, to make sure that when a module is followed by certain activation functions, only the relevant ranges are quantized. We do this for: * ReLU - Clip all negative values * Tanh / Sigmoid - Clip according to the (approximated) saturation values for these functions. We use [-4, 4] for tanh and [-6, 6] for sigmoid. * Perform batch-norm folding before post-training quantization. Batch-norm parameters are folded into the parameters of the previous layer and the BN layer is replaced with an identity module. * Both BN folding and "net-aware" are now automatically executed in PostTrainLinearQuantizer (details of this change below) * BN folding enabled by new generic mechanism to "fuse" module sequences (at the Python API level) * First module in sequence is replaced/modified by a user-provided function, rest of moudles replaced with nn.Identity * Quantizer changes: * Optionally create adjacency map during prepare_model * Subclasses may enforce adjacency map creation * Refatcoring: Replace _prepare_model_impl with pre and post override-able "callbacks", so core functionality is always executed * PostTrainLinearQuantizer Changes: * Enforce creation of adjacency map. This means users must now pass a dummy input to PostTrainLinearQuantizer.prepare_model * Before module replacement - Apply BN folding and stats updates according to net-aware quantization * Updated the language model quantization tutorial to reflect the new functionality * Updated the image classification post-train quantization samples (command line and YAML) * Other changes: * Distller LSTM implementation: Replace the ModuleList for cells with a plain list. The PyTorch trace mechanism doesn't "see" ModuleList objects, it only sees the contained modules. This means that the "scopeName" of these modules isn't complete, which makes it impossible to match op names in SummaryGraph to modules in the Python model. * ActivationStatsCollector: Ignore nn.Identity modules
-
- Jul 08, 2019
-
-
Guy Jacob authored
-
- Jun 10, 2019
-
-
Neta Zmora authored
Some links have changed with the latest version of mkdocs. This closes issues #280 and #65 (reopened).
-
- May 19, 2019
- Apr 14, 2019
-
-
Guy Jacob authored
* Some refactoring to enable multiple clipping methods * BREAKING: clip_acts as a boolean flag (either in command line or in function signature) will fail. Error message with valid values from is displayed. * Implemented clipping activations at mean + N * std (N is user configurable) * Additional tests * Updated docs
-
Guy Jacob authored
-
- Apr 08, 2019
-
-
Neta Zmora authored
Unnfortunately, we maintain 2 copies of documentation images (one for the documentation source; another for the generated documentation). We need to solve this as it makes the repository size unproportionally large.
-
Neta Zmora authored
Add finer control over the pruning logic, to accommodate more pruning use-cases. The full description of the new logic is available in the updated [documentation of the CompressionScheduler](https://nervanasystems.github.io/distiller/schedule.html#pruning-fine-control), which is also part of this PR. In this PR: * Added a new callback to the CompressionScheduler: compression_scheduler.before_parameter_optimization which is invoked after the gradients are are computed, but before the weights are updated by the optimizer. * We provide an option to mask the gradients, before the weights are updated by the optimizer. We register to the parameter backward hook in order to mask the gradients. This gives us finer control over the parameter updates. * Added several DropFilter schedules. DropFilter is a method to regularize networks, and it can also be used to "prepare" a network for permanent filter pruning. *Added documentation of pruning fine-control
-
- Apr 01, 2019
-
-
Lev Zlotnik authored
* Bias handling: * Add 'bits_bias' parameter to explicitly specify # of bits for bias, similar to weights and activations. * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter * Custom overrides: * Expand the semantics of the overrides dict to allow overriding of other parameters in addition to bit-widths * Functions registered in the quantizer's 'replacement_factory' can define keyword arguments. Non bit-width entries in the overrides dict will be checked against the function signature and passed * BREAKING: * Changed the name of 'bits_overrides' to simply 'overrides' * Bit-width overrides must now be defined using the full parameter names - 'bits_activations/weights/bias' instead of the short-hands 'acts' and 'wts' which were used so far. * Added/updated relevant tests * Modified all quantization YAMLs under 'examples' to reflect these changes * Updated docs
-
- Mar 29, 2019
-
-
Songyi Blair Han authored
-
- Feb 26, 2019
-
-
Lev Zlotnik authored
Not backward compatible - re-installation is required * Fixes for PyTorch==1.0.0 * Refactoring folder structure * Update installation section in docs
-
- Feb 11, 2019
-
-
Guy Jacob authored
Summary of changes: (1) Post-train quantization based on pre-collected statistics (2) Quantized concat, element-wise addition / multiplication and embeddings (3) Move post-train quantization command line args out of sample code (4) Configure post-train quantization from YAML for more fine-grained control (See PR #136 for more detailed changes descriptions)
-
- Dec 11, 2018
-
-
Guy Jacob authored
-
- Dec 06, 2018
-
-
Neta Zmora authored
Add missing image files [arghhhh.]
-
Neta Zmora authored
Add missing files :-(
-
Neta Zmora authored
- Moved the Language model and struct pruning tutorials from the Wiki to the HTML documentation. Love the ease of Wiki, but GitHub doesn't let Google crawl these pages, and users can't open PRs on Wiki pages. - Updated the pruning algorithms documentation
-
- Dec 04, 2018
-
-
Guy Jacob authored
* Asymmetric post-training quantization (only symmetric supported so until now) * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization * Per-channel quantization support in both training and post-training * Added tests and examples * Updated documentation
-
- Nov 25, 2018
-
-
Neta Zmora authored
-
- Nov 24, 2018
-
-
Neta Zmora authored
Thanks to Dan Alistarh for bringing this issue to my attention. The activations of Linear layers have shape (batch_size, output_size) and those of Convolution layers have shape (batch_size, num_channels, width, height) and this distinction in shape was not correctly handled. This commit also fixes sparsity computation for very large activations, as seen in VGG16, which leads to memory exhaustion. One solution is to use smaller batch sizes, but this commit uses a different solution, which counts zeros “manually”, and using less space. Also in this commit: - Added a “caveats” section to the documentation. - Added more tests.
-
- Nov 21, 2018
-
-
Neta Zmora authored
-
- Nov 07, 2018
-
-
Neta Zmora authored
Add missing files from previous commit
-
Neta Zmora authored
-
- Nov 04, 2018
-
-
Neta Zmora authored
-
- Oct 03, 2018
-
-
Neta Zmora authored
Latest versions of Jupyter notebooks have a different syntax for launching the server such that it listens on oll network interfaces (this is useful if you are running the Jupyter server on one machine, and connect to it from a browser on a different machine). So: jupyter-notebook --ip=* --no-browser is replaced by: jupyter-notebook --ip=0.0.0.0 --no-browser
-
- Sep 16, 2018
-
-
Neta Zmora authored
* A temporary fix for issue 36 The thinning code assumes that the sgraph it is using is not data-parallel, because it (currently) accesses the layer-name keys using a "normalized" name ("module." is removed). The bug is that in thinning.py#L73 we create a data_parallel=True model; and then give it to sgraph. But in other places thinning code uses "normalized" keys. For example in thinning.py#L264. The temporary fix configures data_parallel=False in thinning.py#L73. A long term solution should have SummaryGraph know how to handle both parallel and not-parallel models. This can be done by having SummaryGraph convert layer-names it receives in the API to data_parallel=False using normalize_layer_name. When returning results, use the de-normalized format. * Fix the documentation error from issue 36 * Move some logs to debug and show in logging.conf how to enable DEBUG logs.
-
- Sep 03, 2018
-
-
Guy Jacob authored
* Implemented as a Policy * Integrated in image classification sample * Updated docs and README
-
- Jul 22, 2018
-
-
Gal Novik authored
* Adding PACT quantization method * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself * Updated documentation and tests
-
- Jul 17, 2018
-
-
Guy Jacob authored
* Add Quantizer unit tests * Require 'bits_overrides' to be OrderedDict to support overlapping patterns in a predictable manner + update documentation to reflect this * Quantizer class cleanup * Use "public" nn.Module APIs instead of protected attributes * Call the builtins set/get/delattr instead of the class special methods (__***__) * Fix issues reported in #24 * Bug in RangeLinearQuantParamLayerWrapper - add explicit override of pre_quantized_forward accpeting single input (#15) * Add DoReFa test to full_flow_tests
-
- Jul 01, 2018
-
-
Guy Jacob authored
* Scale of bias and parentheses were wrong
-
- Jun 21, 2018
- Jun 14, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
-
- May 22, 2018
-
-
Neta Zmora authored
Two places in the documentation gave the wrong path to the example Alexnet sensitivity pruning schedule.
-
- May 14, 2018
-
-
Guy Jacob authored
-