Skip to content
Snippets Groups Projects
  • Guy Jacob's avatar
    43548deb
    Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb
    Guy Jacob authored
    * "Net-aware quantization" - using the term coined in
      https://arxiv.org/abs/1811.09886. (section 3.2.2).
      Refers to considering sequences of modules when quantizing. This 
      isn't exactly layer fusion - we modify activation stats prior to
      setting quantization parameters, to make sure that when a module
      is followed by certain activation functions, only the relevant
      ranges are quantized. We do this for:
        * ReLU - Clip all negative values
        * Tanh / Sigmoid - Clip according to the (approximated) saturation
          values for these functions. We use [-4, 4] for tanh and [-6, 6]
          for sigmoid.
    
    * Perform batch-norm folding before post-training quantization.
      Batch-norm parameters are folded into the parameters of the previous
      layer and the BN layer is replaced with an identity module.
    
    * Both BN folding and "net-aware" are now automatically executed
      in PostTrainLinearQuantizer (details of this change below)
    
    * BN folding enabled by new generic mechanism to "fuse" module
      sequences (at the Python API level)
        * First module in sequence is replaced/modified by a user-provided
          function, rest of moudles replaced with nn.Identity
    
    * Quantizer changes:
      * Optionally create adjacency map during prepare_model
      * Subclasses may enforce adjacency map creation
      * Refatcoring: Replace _prepare_model_impl with pre and post
        override-able "callbacks", so core functionality is always executed
    
    * PostTrainLinearQuantizer Changes:
      * Enforce creation of adjacency map. This means users must now pass a
        dummy input to PostTrainLinearQuantizer.prepare_model
      * Before module replacement - Apply BN folding and stats updates according
        to net-aware quantization
    
    * Updated the language model quantization tutorial to reflect the new
      functionality
    
    * Updated the image classification post-train quantization samples
      (command line and YAML)
    
    * Other changes:
      * Distller LSTM implementation:
        Replace the ModuleList for cells with a plain list. The PyTorch trace
        mechanism doesn't "see" ModuleList objects, it only sees the 
        contained modules. This means that the "scopeName" of these modules
        isn't complete, which makes it impossible to match op names in 
        SummaryGraph to modules in the Python model.
      * ActivationStatsCollector: Ignore nn.Identity modules
    Post-Train Quantization: BN folding and "net-aware quantization" (#313)
    Guy Jacob authored
    * "Net-aware quantization" - using the term coined in
      https://arxiv.org/abs/1811.09886. (section 3.2.2).
      Refers to considering sequences of modules when quantizing. This 
      isn't exactly layer fusion - we modify activation stats prior to
      setting quantization parameters, to make sure that when a module
      is followed by certain activation functions, only the relevant
      ranges are quantized. We do this for:
        * ReLU - Clip all negative values
        * Tanh / Sigmoid - Clip according to the (approximated) saturation
          values for these functions. We use [-4, 4] for tanh and [-6, 6]
          for sigmoid.
    
    * Perform batch-norm folding before post-training quantization.
      Batch-norm parameters are folded into the parameters of the previous
      layer and the BN layer is replaced with an identity module.
    
    * Both BN folding and "net-aware" are now automatically executed
      in PostTrainLinearQuantizer (details of this change below)
    
    * BN folding enabled by new generic mechanism to "fuse" module
      sequences (at the Python API level)
        * First module in sequence is replaced/modified by a user-provided
          function, rest of moudles replaced with nn.Identity
    
    * Quantizer changes:
      * Optionally create adjacency map during prepare_model
      * Subclasses may enforce adjacency map creation
      * Refatcoring: Replace _prepare_model_impl with pre and post
        override-able "callbacks", so core functionality is always executed
    
    * PostTrainLinearQuantizer Changes:
      * Enforce creation of adjacency map. This means users must now pass a
        dummy input to PostTrainLinearQuantizer.prepare_model
      * Before module replacement - Apply BN folding and stats updates according
        to net-aware quantization
    
    * Updated the language model quantization tutorial to reflect the new
      functionality
    
    * Updated the image classification post-train quantization samples
      (command line and YAML)
    
    * Other changes:
      * Distller LSTM implementation:
        Replace the ModuleList for cells with a plain list. The PyTorch trace
        mechanism doesn't "see" ModuleList objects, it only sees the 
        contained modules. This means that the "scopeName" of these modules
        isn't complete, which makes it impossible to match op names in 
        SummaryGraph to modules in the Python model.
      * ActivationStatsCollector: Ignore nn.Identity modules