Skip to content
Snippets Groups Projects
  1. Oct 07, 2019
    • Guy Jacob's avatar
      Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987
      Guy Jacob authored
      
      * Greedy search script for post-training quantization settings
        * Iterates over each layer in the model in order. For each layer,
          checks a user-defined set of quantization settings and chooses
          the best one based on validation accuracy
        * Provided sample that searches for best activations-clipping
          mode per layer, on image classification models
      
      * Proper handling of mixed-quantization settings in post-train quant:
        * By default, the quantization settings for each layer apply only
          to output quantization
        * Propagate quantization settings for activations tensors through
          the model during execution
        * For non-quantized inputs to layers that require quantized inputs,
          fall-back to quantizing according to the settings used for the
          output
        * In addition, provide mechanism to override inputs quantization
          settings via the YAML configuration file
        * By default all modules are quantized now. For module types that
          don't have a dedicated quantized implementation, "fake"
          quantization is performed
      
      * Misc. Changes
        * Fuse ReLU/ReLU6 to predecessor during post-training quantization
        * Fixes to ACIQ clipping in the half-range case
      
      Co-authored-by: default avatarLev Zlotnik <lev.zlotnik@intel.com>
      Co-authored-by: default avatarGuy Jacob <guy.jacob@intel.com>
      Unverified
      9e7ef987
  2. Sep 10, 2019
  3. Aug 08, 2019
  4. Aug 07, 2019
  5. Jul 10, 2019
    • Guy Jacob's avatar
      Update post-train quant command line example · 112163eb
      Guy Jacob authored
      Unverified
      112163eb
    • Guy Jacob's avatar
      Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb
      Guy Jacob authored
      * "Net-aware quantization" - using the term coined in
        https://arxiv.org/abs/1811.09886. (section 3.2.2).
        Refers to considering sequences of modules when quantizing. This 
        isn't exactly layer fusion - we modify activation stats prior to
        setting quantization parameters, to make sure that when a module
        is followed by certain activation functions, only the relevant
        ranges are quantized. We do this for:
          * ReLU - Clip all negative values
          * Tanh / Sigmoid - Clip according to the (approximated) saturation
            values for these functions. We use [-4, 4] for tanh and [-6, 6]
            for sigmoid.
      
      * Perform batch-norm folding before post-training quantization.
        Batch-norm parameters are folded into the parameters of the previous
        layer and the BN layer is replaced with an identity module.
      
      * Both BN folding and "net-aware" are now automatically executed
        in PostTrainLinearQuantizer (details of this change below)
      
      * BN folding enabled by new generic mechanism to "fuse" module
        sequences (at the Python API level)
          * First module in sequence is replaced/modified by a user-provided
            function, rest of moudles replaced with nn.Identity
      
      * Quantizer changes:
        * Optionally create adjacency map during prepare_model
        * Subclasses may enforce adjacency map creation
        * Refatcoring: Replace _prepare_model_impl with pre and post
          override-able "callbacks", so core functionality is always executed
      
      * PostTrainLinearQuantizer Changes:
        * Enforce creation of adjacency map. This means users must now pass a
          dummy input to PostTrainLinearQuantizer.prepare_model
        * Before module replacement - Apply BN folding and stats updates according
          to net-aware quantization
      
      * Updated the language model quantization tutorial to reflect the new
        functionality
      
      * Updated the image classification post-train quantization samples
        (command line and YAML)
      
      * Other changes:
        * Distller LSTM implementation:
          Replace the ModuleList for cells with a plain list. The PyTorch trace
          mechanism doesn't "see" ModuleList objects, it only sees the 
          contained modules. This means that the "scopeName" of these modules
          isn't complete, which makes it impossible to match op names in 
          SummaryGraph to modules in the Python model.
        * ActivationStatsCollector: Ignore nn.Identity modules
      Unverified
      43548deb
  6. Jun 03, 2019
    • Lev Zlotnik's avatar
      [Breaking] PTQ: Removed special handling of clipping overrides · 3cde6c5e
      Lev Zlotnik authored
      * In PostTrainLinearQuantizer - moved 'clip_acts' and 'clip_n_stds'
        to overrides, removed 'no_clip_layers' parameter from __init__
      * The 'no_clip_layers' command line argument REMAINS, handled in 
        PostTrainLinearQuantizer.from_args()
      * Removed old code from comments, fixed warnings in 
        test_post_train_quant.py
      * Updated tests
      * Update post-train quant sample YAML
      3cde6c5e
  7. May 20, 2019
    • Guy Jacob's avatar
      NCF scripts with Distiller integration · 4385084a
      Guy Jacob authored
      This NCF implementation is based on the implementation found in the MLPerf
      Training GitHub repository, specifically on the last revision of the code
      before the switch to the extended dataset. See:
      https://github.com/mlperf/training/tree/fe17e837ed12974d15c86d5173fe8f2c188434d5/recommendation/pytorch
      
      We've made several modifications to the code:
      * Removed all MLPerf specific code including logging
      * In ncf.py:
        * Added calls to Distiller compression APIs
        * Added progress indication in training and evaluation flows
      * In neumf.py:
        * Added option to split final FC layer
        * Replaced all functional calls with modules so they can be detected
          by Distiller
      * In dataset.py:
        * Speed up data loading - On first data will is loaded from CSVs and
          then pickled. On subsequent runs the pickle is loaded. This is much
          faster than the original implementation, but still very slow.
        * Added progress indication during data load process
      * Removed some irrelevant content from README.md
      4385084a
  8. May 19, 2019
  9. Apr 14, 2019
    • Guy Jacob's avatar
      Post-train quant: Extend acts clipping functionality (#225) · 437e270b
      Guy Jacob authored
      * Some refactoring to enable multiple clipping methods
      * BREAKING: clip_acts as a boolean flag (either in command line
        or in function signature) will fail. Error message with valid
        values from is displayed.
      * Implemented clipping activations at mean + N * std
        (N is user configurable)
      * Additional tests
      * Updated docs
      Unverified
      437e270b
  10. Apr 01, 2019
    • Lev Zlotnik's avatar
      Quantizer: Specify # bias bits + custom overrides (BREAKING) (#178) · 5271625a
      Lev Zlotnik authored
      * Bias handling:
        * Add 'bits_bias' parameter to explicitly specify # of bits for bias,
          similar to weights and activations.
        * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter
      * Custom overrides:
        * Expand the semantics of the overrides dict to allow overriding of
          other parameters in addition to bit-widths
        * Functions registered in the quantizer's 'replacement_factory' can
          define keyword arguments. Non bit-width entries in the overrides
          dict will be checked against the function signature and passed
        * BREAKING:
          * Changed the name of 'bits_overrides' to simply 'overrides'
          * Bit-width overrides must now be defined using the full parameter
            names - 'bits_activations/weights/bias' instead of the short-hands
            'acts' and 'wts' which were used so far.
        * Added/updated relevant tests
        * Modified all quantization YAMLs under 'examples' to reflect 
          these changes
        * Updated docs
      5271625a
  11. Mar 27, 2019
  12. Feb 11, 2019
    • Guy Jacob's avatar
      Post-train quant based on stats + additional modules quantized (#136) · 28a8ee18
      Guy Jacob authored
      Summary of changes:
      (1) Post-train quantization based on pre-collected statistics
      (2) Quantized concat, element-wise addition / multiplication and embeddings
      (3) Move post-train quantization command line args out of sample code
      (4) Configure post-train quantization from YAML for more fine-grained control
      
      (See PR #136 for more detailed changes descriptions)
      Unverified
      28a8ee18
  13. Dec 04, 2018
    • Guy Jacob's avatar
      Range-Based Linear Quantization Features (#95) · 907a6f04
      Guy Jacob authored
      * Asymmetric post-training quantization (only symmetric supported so until now)
      * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
      * Per-channel quantization support in both training and post-training
      * Added tests and examples
      * Updated documentation
      Unverified
      907a6f04
  14. Dec 02, 2018
  15. Jul 22, 2018
    • Gal Novik's avatar
      PACT quantizer (#30) · df9a00ce
      Gal Novik authored
      * Adding PACT quantization method
      * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
      * Updated documentation and tests
      df9a00ce
  16. Jun 21, 2018
Loading