You need to sign in or sign up before continuing.
- Aug 07, 2019
-
-
Guy Jacob authored
-
- Jul 10, 2019
-
-
Guy Jacob authored
-
Guy Jacob authored
* "Net-aware quantization" - using the term coined in https://arxiv.org/abs/1811.09886. (section 3.2.2). Refers to considering sequences of modules when quantizing. This isn't exactly layer fusion - we modify activation stats prior to setting quantization parameters, to make sure that when a module is followed by certain activation functions, only the relevant ranges are quantized. We do this for: * ReLU - Clip all negative values * Tanh / Sigmoid - Clip according to the (approximated) saturation values for these functions. We use [-4, 4] for tanh and [-6, 6] for sigmoid. * Perform batch-norm folding before post-training quantization. Batch-norm parameters are folded into the parameters of the previous layer and the BN layer is replaced with an identity module. * Both BN folding and "net-aware" are now automatically executed in PostTrainLinearQuantizer (details of this change below) * BN folding enabled by new generic mechanism to "fuse" module sequences (at the Python API level) * First module in sequence is replaced/modified by a user-provided function, rest of moudles replaced with nn.Identity * Quantizer changes: * Optionally create adjacency map during prepare_model * Subclasses may enforce adjacency map creation * Refatcoring: Replace _prepare_model_impl with pre and post override-able "callbacks", so core functionality is always executed * PostTrainLinearQuantizer Changes: * Enforce creation of adjacency map. This means users must now pass a dummy input to PostTrainLinearQuantizer.prepare_model * Before module replacement - Apply BN folding and stats updates according to net-aware quantization * Updated the language model quantization tutorial to reflect the new functionality * Updated the image classification post-train quantization samples (command line and YAML) * Other changes: * Distller LSTM implementation: Replace the ModuleList for cells with a plain list. The PyTorch trace mechanism doesn't "see" ModuleList objects, it only sees the contained modules. This means that the "scopeName" of these modules isn't complete, which makes it impossible to match op names in SummaryGraph to modules in the Python model. * ActivationStatsCollector: Ignore nn.Identity modules
-
- Jun 03, 2019
-
-
Lev Zlotnik authored
* In PostTrainLinearQuantizer - moved 'clip_acts' and 'clip_n_stds' to overrides, removed 'no_clip_layers' parameter from __init__ * The 'no_clip_layers' command line argument REMAINS, handled in PostTrainLinearQuantizer.from_args() * Removed old code from comments, fixed warnings in test_post_train_quant.py * Updated tests * Update post-train quant sample YAML
-
- May 19, 2019
-
-
Guy Jacob authored
* Added scale factor approximation in post-training quantization using integer multiply + shift. # of bits for integer multiplier is user configurable * Updated documentation * Updated post-train quant command line examples readme file
-
- Apr 14, 2019
-
-
Guy Jacob authored
* Some refactoring to enable multiple clipping methods * BREAKING: clip_acts as a boolean flag (either in command line or in function signature) will fail. Error message with valid values from is displayed. * Implemented clipping activations at mean + N * std (N is user configurable) * Additional tests * Updated docs
-
- Apr 01, 2019
-
-
Lev Zlotnik authored
* Bias handling: * Add 'bits_bias' parameter to explicitly specify # of bits for bias, similar to weights and activations. * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter * Custom overrides: * Expand the semantics of the overrides dict to allow overriding of other parameters in addition to bit-widths * Functions registered in the quantizer's 'replacement_factory' can define keyword arguments. Non bit-width entries in the overrides dict will be checked against the function signature and passed * BREAKING: * Changed the name of 'bits_overrides' to simply 'overrides' * Bit-width overrides must now be defined using the full parameter names - 'bits_activations/weights/bias' instead of the short-hands 'acts' and 'wts' which were used so far. * Added/updated relevant tests * Modified all quantization YAMLs under 'examples' to reflect these changes * Updated docs
-
- Mar 27, 2019
-
-
Guy Jacob authored
-
- Feb 11, 2019
-
-
Guy Jacob authored
Summary of changes: (1) Post-train quantization based on pre-collected statistics (2) Quantized concat, element-wise addition / multiplication and embeddings (3) Move post-train quantization command line args out of sample code (4) Configure post-train quantization from YAML for more fine-grained control (See PR #136 for more detailed changes descriptions)
-
- Dec 04, 2018
-
-
Guy Jacob authored
* Asymmetric post-training quantization (only symmetric supported so until now) * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization * Per-channel quantization support in both training and post-training * Added tests and examples * Updated documentation
-
- Dec 02, 2018
- Jul 22, 2018
-
-
Gal Novik authored
* Adding PACT quantization method * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself * Updated documentation and tests
-
- Jun 21, 2018