Skip to content
Snippets Groups Projects
  1. Oct 07, 2019
    • Guy Jacob's avatar
      Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987
      Guy Jacob authored
      
      * Greedy search script for post-training quantization settings
        * Iterates over each layer in the model in order. For each layer,
          checks a user-defined set of quantization settings and chooses
          the best one based on validation accuracy
        * Provided sample that searches for best activations-clipping
          mode per layer, on image classification models
      
      * Proper handling of mixed-quantization settings in post-train quant:
        * By default, the quantization settings for each layer apply only
          to output quantization
        * Propagate quantization settings for activations tensors through
          the model during execution
        * For non-quantized inputs to layers that require quantized inputs,
          fall-back to quantizing according to the settings used for the
          output
        * In addition, provide mechanism to override inputs quantization
          settings via the YAML configuration file
        * By default all modules are quantized now. For module types that
          don't have a dedicated quantized implementation, "fake"
          quantization is performed
      
      * Misc. Changes
        * Fuse ReLU/ReLU6 to predecessor during post-training quantization
        * Fixes to ACIQ clipping in the half-range case
      
      Co-authored-by: default avatarLev Zlotnik <lev.zlotnik@intel.com>
      Co-authored-by: default avatarGuy Jacob <guy.jacob@intel.com>
      Unverified
      9e7ef987
  2. Jul 10, 2019
    • Guy Jacob's avatar
      Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb
      Guy Jacob authored
      * "Net-aware quantization" - using the term coined in
        https://arxiv.org/abs/1811.09886. (section 3.2.2).
        Refers to considering sequences of modules when quantizing. This 
        isn't exactly layer fusion - we modify activation stats prior to
        setting quantization parameters, to make sure that when a module
        is followed by certain activation functions, only the relevant
        ranges are quantized. We do this for:
          * ReLU - Clip all negative values
          * Tanh / Sigmoid - Clip according to the (approximated) saturation
            values for these functions. We use [-4, 4] for tanh and [-6, 6]
            for sigmoid.
      
      * Perform batch-norm folding before post-training quantization.
        Batch-norm parameters are folded into the parameters of the previous
        layer and the BN layer is replaced with an identity module.
      
      * Both BN folding and "net-aware" are now automatically executed
        in PostTrainLinearQuantizer (details of this change below)
      
      * BN folding enabled by new generic mechanism to "fuse" module
        sequences (at the Python API level)
          * First module in sequence is replaced/modified by a user-provided
            function, rest of moudles replaced with nn.Identity
      
      * Quantizer changes:
        * Optionally create adjacency map during prepare_model
        * Subclasses may enforce adjacency map creation
        * Refatcoring: Replace _prepare_model_impl with pre and post
          override-able "callbacks", so core functionality is always executed
      
      * PostTrainLinearQuantizer Changes:
        * Enforce creation of adjacency map. This means users must now pass a
          dummy input to PostTrainLinearQuantizer.prepare_model
        * Before module replacement - Apply BN folding and stats updates according
          to net-aware quantization
      
      * Updated the language model quantization tutorial to reflect the new
        functionality
      
      * Updated the image classification post-train quantization samples
        (command line and YAML)
      
      * Other changes:
        * Distller LSTM implementation:
          Replace the ModuleList for cells with a plain list. The PyTorch trace
          mechanism doesn't "see" ModuleList objects, it only sees the 
          contained modules. This means that the "scopeName" of these modules
          isn't complete, which makes it impossible to match op names in 
          SummaryGraph to modules in the Python model.
        * ActivationStatsCollector: Ignore nn.Identity modules
      Unverified
      43548deb
Loading