- Oct 07, 2019
-
-
Guy Jacob authored
* Greedy search script for post-training quantization settings * Iterates over each layer in the model in order. For each layer, checks a user-defined set of quantization settings and chooses the best one based on validation accuracy * Provided sample that searches for best activations-clipping mode per layer, on image classification models * Proper handling of mixed-quantization settings in post-train quant: * By default, the quantization settings for each layer apply only to output quantization * Propagate quantization settings for activations tensors through the model during execution * For non-quantized inputs to layers that require quantized inputs, fall-back to quantizing according to the settings used for the output * In addition, provide mechanism to override inputs quantization settings via the YAML configuration file * By default all modules are quantized now. For module types that don't have a dedicated quantized implementation, "fake" quantization is performed * Misc. Changes * Fuse ReLU/ReLU6 to predecessor during post-training quantization * Fixes to ACIQ clipping in the half-range case Co-authored-by:
Lev Zlotnik <lev.zlotnik@intel.com> Co-authored-by:
Guy Jacob <guy.jacob@intel.com>
-
- Sep 10, 2019
-
-
Yury Nahshan authored
ACIQ clipping method, as described in: Post training 4-bit quantization of convolutional networks for rapid-deployment (Ron Banner , Yury Nahshan, Daniel Soudry) (NeurIPS 2019) https://arxiv.org/abs/1810.05723 Co-authored-by:
Yury Nahshan <yury.nahshan@intel.com> Co-authored-by:
Lev Zlotnik <lev.zlotnik@intel.com>
-
- Aug 08, 2019
-
-
Guy Jacob authored
-
- Aug 07, 2019
-
-
Guy Jacob authored
-
- Jul 10, 2019
-
-
Guy Jacob authored
-
Guy Jacob authored
* "Net-aware quantization" - using the term coined in https://arxiv.org/abs/1811.09886. (section 3.2.2). Refers to considering sequences of modules when quantizing. This isn't exactly layer fusion - we modify activation stats prior to setting quantization parameters, to make sure that when a module is followed by certain activation functions, only the relevant ranges are quantized. We do this for: * ReLU - Clip all negative values * Tanh / Sigmoid - Clip according to the (approximated) saturation values for these functions. We use [-4, 4] for tanh and [-6, 6] for sigmoid. * Perform batch-norm folding before post-training quantization. Batch-norm parameters are folded into the parameters of the previous layer and the BN layer is replaced with an identity module. * Both BN folding and "net-aware" are now automatically executed in PostTrainLinearQuantizer (details of this change below) * BN folding enabled by new generic mechanism to "fuse" module sequences (at the Python API level) * First module in sequence is replaced/modified by a user-provided function, rest of moudles replaced with nn.Identity * Quantizer changes: * Optionally create adjacency map during prepare_model * Subclasses may enforce adjacency map creation * Refatcoring: Replace _prepare_model_impl with pre and post override-able "callbacks", so core functionality is always executed * PostTrainLinearQuantizer Changes: * Enforce creation of adjacency map. This means users must now pass a dummy input to PostTrainLinearQuantizer.prepare_model * Before module replacement - Apply BN folding and stats updates according to net-aware quantization * Updated the language model quantization tutorial to reflect the new functionality * Updated the image classification post-train quantization samples (command line and YAML) * Other changes: * Distller LSTM implementation: Replace the ModuleList for cells with a plain list. The PyTorch trace mechanism doesn't "see" ModuleList objects, it only sees the contained modules. This means that the "scopeName" of these modules isn't complete, which makes it impossible to match op names in SummaryGraph to modules in the Python model. * ActivationStatsCollector: Ignore nn.Identity modules
-
- Jun 03, 2019
-
-
Lev Zlotnik authored
* In PostTrainLinearQuantizer - moved 'clip_acts' and 'clip_n_stds' to overrides, removed 'no_clip_layers' parameter from __init__ * The 'no_clip_layers' command line argument REMAINS, handled in PostTrainLinearQuantizer.from_args() * Removed old code from comments, fixed warnings in test_post_train_quant.py * Updated tests * Update post-train quant sample YAML
-
- May 20, 2019
-
-
Guy Jacob authored
This NCF implementation is based on the implementation found in the MLPerf Training GitHub repository, specifically on the last revision of the code before the switch to the extended dataset. See: https://github.com/mlperf/training/tree/fe17e837ed12974d15c86d5173fe8f2c188434d5/recommendation/pytorch We've made several modifications to the code: * Removed all MLPerf specific code including logging * In ncf.py: * Added calls to Distiller compression APIs * Added progress indication in training and evaluation flows * In neumf.py: * Added option to split final FC layer * Replaced all functional calls with modules so they can be detected by Distiller * In dataset.py: * Speed up data loading - On first data will is loaded from CSVs and then pickled. On subsequent runs the pickle is loaded. This is much faster than the original implementation, but still very slow. * Added progress indication during data load process * Removed some irrelevant content from README.md
-
- May 19, 2019
-
-
Guy Jacob authored
* Added scale factor approximation in post-training quantization using integer multiply + shift. # of bits for integer multiplier is user configurable * Updated documentation * Updated post-train quant command line examples readme file
-
- Apr 14, 2019
-
-
Guy Jacob authored
* Some refactoring to enable multiple clipping methods * BREAKING: clip_acts as a boolean flag (either in command line or in function signature) will fail. Error message with valid values from is displayed. * Implemented clipping activations at mean + N * std (N is user configurable) * Additional tests * Updated docs
-
- Apr 01, 2019
-
-
Lev Zlotnik authored
* Bias handling: * Add 'bits_bias' parameter to explicitly specify # of bits for bias, similar to weights and activations. * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter * Custom overrides: * Expand the semantics of the overrides dict to allow overriding of other parameters in addition to bit-widths * Functions registered in the quantizer's 'replacement_factory' can define keyword arguments. Non bit-width entries in the overrides dict will be checked against the function signature and passed * BREAKING: * Changed the name of 'bits_overrides' to simply 'overrides' * Bit-width overrides must now be defined using the full parameter names - 'bits_activations/weights/bias' instead of the short-hands 'acts' and 'wts' which were used so far. * Added/updated relevant tests * Modified all quantization YAMLs under 'examples' to reflect these changes * Updated docs
-
- Mar 27, 2019
-
-
Guy Jacob authored
-
- Feb 11, 2019
-
-
Guy Jacob authored
Summary of changes: (1) Post-train quantization based on pre-collected statistics (2) Quantized concat, element-wise addition / multiplication and embeddings (3) Move post-train quantization command line args out of sample code (4) Configure post-train quantization from YAML for more fine-grained control (See PR #136 for more detailed changes descriptions)
-
- Dec 04, 2018
-
-
Guy Jacob authored
* Asymmetric post-training quantization (only symmetric supported so until now) * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization * Per-channel quantization support in both training and post-training * Added tests and examples * Updated documentation
-
- Dec 02, 2018
- Jul 22, 2018
-
-
Gal Novik authored
* Adding PACT quantization method * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself * Updated documentation and tests
-
- Jun 21, 2018