Commits · 5059419b8e436e773299031beedfd968cb84bff8 · llvm / distiller · GitLab

Snippets Groups Projects

Oct 07, 2019

Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987

Guy Jacob authored 5 years ago


* Greedy search script for post-training quantization settings
  * Iterates over each layer in the model in order. For each layer,
    checks a user-defined set of quantization settings and chooses
    the best one based on validation accuracy
  * Provided sample that searches for best activations-clipping
    mode per layer, on image classification models

* Proper handling of mixed-quantization settings in post-train quant:
  * By default, the quantization settings for each layer apply only
    to output quantization
  * Propagate quantization settings for activations tensors through
    the model during execution
  * For non-quantized inputs to layers that require quantized inputs,
    fall-back to quantizing according to the settings used for the
    output
  * In addition, provide mechanism to override inputs quantization
    settings via the YAML configuration file
  * By default all modules are quantized now. For module types that
    don't have a dedicated quantized implementation, "fake"
    quantization is performed

* Misc. Changes
  * Fuse ReLU/ReLU6 to predecessor during post-training quantization
  * Fixes to ACIQ clipping in the half-range case

Co-authored-by: Lev Zlotnik <lev.zlotnik@intel.com>
Co-authored-by: Guy Jacob <guy.jacob@intel.com>

9e7ef987

Sep 10, 2019

ACIQ clipping in post-training quantization (#173) · 534072d8

Yury Nahshan authored 5 years ago

ACIQ clipping method, as described in:

Post training 4-bit quantization of convolutional networks for rapid-deployment
(Ron Banner , Yury Nahshan, Daniel Soudry)
(NeurIPS 2019)

https://arxiv.org/abs/1810.05723



Co-authored-by: Yury Nahshan <yury.nahshan@intel.com>
Co-authored-by: Lev Zlotnik <lev.zlotnik@intel.com>

534072d8

Aug 08, 2019
- Remove unnecessary script + NCF QAT scheduler (for now) · 8ca422db
  Guy Jacob authored 5 years ago
  
  8ca422db
Aug 07, 2019
- Add experiment details in AlexNet BN yamls (FP32 and DoReFa) · e65ec8fc
  Guy Jacob authored 5 years ago
  
  e65ec8fc
Jul 10, 2019

Update post-train quant command line example · 112163eb
Guy Jacob authored 5 years ago

Unverified

112163eb

Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb

Guy Jacob authored 5 years ago

* "Net-aware quantization" - using the term coined in
  https://arxiv.org/abs/1811.09886. (section 3.2.2).
  Refers to considering sequences of modules when quantizing. This 
  isn't exactly layer fusion - we modify activation stats prior to
  setting quantization parameters, to make sure that when a module
  is followed by certain activation functions, only the relevant
  ranges are quantized. We do this for:
    * ReLU - Clip all negative values
    * Tanh / Sigmoid - Clip according to the (approximated) saturation
      values for these functions. We use [-4, 4] for tanh and [-6, 6]
      for sigmoid.

* Perform batch-norm folding before post-training quantization.
  Batch-norm parameters are folded into the parameters of the previous
  layer and the BN layer is replaced with an identity module.

* Both BN folding and "net-aware" are now automatically executed
  in PostTrainLinearQuantizer (details of this change below)

* BN folding enabled by new generic mechanism to "fuse" module
  sequences (at the Python API level)
    * First module in sequence is replaced/modified by a user-provided
      function, rest of moudles replaced with nn.Identity

* Quantizer changes:
  * Optionally create adjacency map during prepare_model
  * Subclasses may enforce adjacency map creation
  * Refatcoring: Replace _prepare_model_impl with pre and post
    override-able "callbacks", so core functionality is always executed

* PostTrainLinearQuantizer Changes:
  * Enforce creation of adjacency map. This means users must now pass a
    dummy input to PostTrainLinearQuantizer.prepare_model
  * Before module replacement - Apply BN folding and stats updates according
    to net-aware quantization

* Updated the language model quantization tutorial to reflect the new
  functionality

* Updated the image classification post-train quantization samples
  (command line and YAML)

* Other changes:
  * Distller LSTM implementation:
    Replace the ModuleList for cells with a plain list. The PyTorch trace
    mechanism doesn't "see" ModuleList objects, it only sees the 
    contained modules. This means that the "scopeName" of these modules
    isn't complete, which makes it impossible to match op names in 
    SummaryGraph to modules in the Python model.
  * ActivationStatsCollector: Ignore nn.Identity modules

43548deb

Jun 03, 2019

[Breaking] PTQ: Removed special handling of clipping overrides · 3cde6c5e

Lev Zlotnik authored 5 years ago

* In PostTrainLinearQuantizer - moved 'clip_acts' and 'clip_n_stds'
  to overrides, removed 'no_clip_layers' parameter from __init__
* The 'no_clip_layers' command line argument REMAINS, handled in 
  PostTrainLinearQuantizer.from_args()
* Removed old code from comments, fixed warnings in 
  test_post_train_quant.py
* Updated tests
* Update post-train quant sample YAML

3cde6c5e

May 20, 2019

NCF scripts with Distiller integration · 4385084a

Guy Jacob authored 5 years ago

This NCF implementation is based on the implementation found in the MLPerf
Training GitHub repository, specifically on the last revision of the code
before the switch to the extended dataset. See:
https://github.com/mlperf/training/tree/fe17e837ed12974d15c86d5173fe8f2c188434d5/recommendation/pytorch

We've made several modifications to the code:
* Removed all MLPerf specific code including logging
* In ncf.py:
  * Added calls to Distiller compression APIs
  * Added progress indication in training and evaluation flows
* In neumf.py:
  * Added option to split final FC layer
  * Replaced all functional calls with modules so they can be detected
    by Distiller
* In dataset.py:
  * Speed up data loading - On first data will is loaded from CSVs and
    then pickled. On subsequent runs the pickle is loaded. This is much
    faster than the original implementation, but still very slow.
  * Added progress indication during data load process
* Removed some irrelevant content from README.md

4385084a

May 19, 2019

Post-training quantization: Scale factor approximation (#261) · 66c0ad1d

Guy Jacob authored 5 years ago

* Added scale factor approximation in post-training quantization using
  integer multiply + shift. # of bits for integer multiplier is user
  configurable
* Updated documentation
* Updated post-train quant command line examples readme file

66c0ad1d

Apr 14, 2019

Post-train quant: Extend acts clipping functionality (#225) · 437e270b

Guy Jacob authored 6 years ago

* Some refactoring to enable multiple clipping methods
* BREAKING: clip_acts as a boolean flag (either in command line
  or in function signature) will fail. Error message with valid
  values from is displayed.
* Implemented clipping activations at mean + N * std
  (N is user configurable)
* Additional tests
* Updated docs

437e270b

Apr 01, 2019

Quantizer: Specify # bias bits + custom overrides (BREAKING) (#178) · 5271625a

Lev Zlotnik authored 6 years ago

* Bias handling:
  * Add 'bits_bias' parameter to explicitly specify # of bits for bias,
    similar to weights and activations.
  * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter
* Custom overrides:
  * Expand the semantics of the overrides dict to allow overriding of
    other parameters in addition to bit-widths
  * Functions registered in the quantizer's 'replacement_factory' can
    define keyword arguments. Non bit-width entries in the overrides
    dict will be checked against the function signature and passed
  * BREAKING:
    * Changed the name of 'bits_overrides' to simply 'overrides'
    * Bit-width overrides must now be defined using the full parameter
      names - 'bits_activations/weights/bias' instead of the short-hands
      'acts' and 'wts' which were used so far.
  * Added/updated relevant tests
  * Modified all quantization YAMLs under 'examples' to reflect 
    these changes
  * Updated docs

5271625a

Mar 27, 2019
- Fix argument typo in a couple of YAMLs (fix #191) · 5800f35e
  Guy Jacob authored 6 years ago
  
  5800f35e
Feb 11, 2019

Post-train quant based on stats + additional modules quantized (#136) · 28a8ee18

Guy Jacob authored 6 years ago

Summary of changes:
(1) Post-train quantization based on pre-collected statistics
(2) Quantized concat, element-wise addition / multiplication and embeddings
(3) Move post-train quantization command line args out of sample code
(4) Configure post-train quantization from YAML for more fine-grained control

(See PR #136 for more detailed changes descriptions)

28a8ee18

Dec 04, 2018

Range-Based Linear Quantization Features (#95) · 907a6f04

Guy Jacob authored 6 years ago

* Asymmetric post-training quantization (only symmetric supported so until now)
* Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
* Per-channel quantization support in both training and post-training
* Added tests and examples
* Updated documentation

907a6f04

Dec 02, 2018
- A couple of clarifications and typo fixes from last commit · 44144f7c
  Guy Jacob authored 6 years ago
  
  44144f7c
- Add knowledge distillation examples + some results · 6d455597
  Guy Jacob authored 6 years ago
  
  6d455597
Jul 22, 2018

PACT quantizer (#30) · df9a00ce

Gal Novik authored 6 years ago

* Adding PACT quantization method
* Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
* Updated documentation and tests

df9a00ce

Jun 21, 2018
- Clarifying comments in quantization YAMLs · a029c9b0
  Guy Jacob authored 6 years ago
  
  a029c9b0
- Training with quantization (#8) · 5bb9e138
  Guy Jacob authored 6 years ago
  
  Unverified
  
  5bb9e138