Skip to content
Snippets Groups Projects
  1. Oct 27, 2020
  2. Jan 15, 2020
    • Guy Jacob's avatar
      Fix scale factor calculation in symmetric quantization (#463) · 78255ee0
      Guy Jacob authored
      (we use 8-bit values below, but this applies to any bit-width)
      * We use the notion of "full" and "restricted" quantized range for
        symmetric quantization (see section 2.2 in
      * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
      * Until now, when doing symmetric quantization we assumed a "full"
        range when saturating after quantization, but calculated the scale
        factor as if the range was restricted. This means we weren't making
        full utilization of the quantized range.
      * On the other hand, in some other implementations of quantization (e.g.
        TensorFlow), the "restricted" range is used.
      * So, we make it an option to use either the proper "full" range
        (q_min = -128) or "restricted" range (q_min = -127).
      * LinearQuantMode.SYMMETRIC now means the "full" range is used, and
        added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
      * Updated tests and documentation.
  3. Aug 08, 2019
  4. Aug 04, 2019
  5. Jul 10, 2019
    • Guy Jacob's avatar
      Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb
      Guy Jacob authored
      * "Net-aware quantization" - using the term coined in (section 3.2.2).
        Refers to considering sequences of modules when quantizing. This 
        isn't exactly layer fusion - we modify activation stats prior to
        setting quantization parameters, to make sure that when a module
        is followed by certain activation functions, only the relevant
        ranges are quantized. We do this for:
          * ReLU - Clip all negative values
          * Tanh / Sigmoid - Clip according to the (approximated) saturation
            values for these functions. We use [-4, 4] for tanh and [-6, 6]
            for sigmoid.
      * Perform batch-norm folding before post-training quantization.
        Batch-norm parameters are folded into the parameters of the previous
        layer and the BN layer is replaced with an identity module.
      * Both BN folding and "net-aware" are now automatically executed
        in PostTrainLinearQuantizer (details of this change below)
      * BN folding enabled by new generic mechanism to "fuse" module
        sequences (at the Python API level)
          * First module in sequence is replaced/modified by a user-provided
            function, rest of moudles replaced with nn.Identity
      * Quantizer changes:
        * Optionally create adjacency map during prepare_model
        * Subclasses may enforce adjacency map creation
        * Refatcoring: Replace _prepare_model_impl with pre and post
          override-able "callbacks", so core functionality is always executed
      * PostTrainLinearQuantizer Changes:
        * Enforce creation of adjacency map. This means users must now pass a
          dummy input to PostTrainLinearQuantizer.prepare_model
        * Before module replacement - Apply BN folding and stats updates according
          to net-aware quantization
      * Updated the language model quantization tutorial to reflect the new
      * Updated the image classification post-train quantization samples
        (command line and YAML)
      * Other changes:
        * Distller LSTM implementation:
          Replace the ModuleList for cells with a plain list. The PyTorch trace
          mechanism doesn't "see" ModuleList objects, it only sees the 
          contained modules. This means that the "scopeName" of these modules
          isn't complete, which makes it impossible to match op names in 
          SummaryGraph to modules in the Python model.
        * ActivationStatsCollector: Ignore nn.Identity modules
  6. Jul 08, 2019
  7. Jun 10, 2019
  8. May 19, 2019
  9. Apr 14, 2019
  10. Apr 08, 2019
    • Neta Zmora's avatar
      Refine pruning logic (#222) · 816a943d
      Neta Zmora authored
      Add finer control over the pruning logic, to accommodate more pruning
      The full description of the new logic is available in the updated [documentation
      of the CompressionScheduler](, which is also part of this PR.
      In this PR:
      * Added a new callback to the CompressionScheduler:
      compression_scheduler.before_parameter_optimization which is invoked
      after the gradients are are computed, but before the weights are updated
      by the optimizer.
      * We provide an option to mask the gradients, before the weights are updated by the optimizer. 
      We register to the parameter backward hook in order to mask the gradients.
      This gives us finer control over the parameter updates.
      * Added several DropFilter schedules.
      DropFilter is a method to regularize networks, and it can also be
      used to "prepare" a network for permanent filter pruning.
      *Added documentation of pruning fine-control
  11. Apr 01, 2019
    • Lev Zlotnik's avatar
      Quantizer: Specify # bias bits + custom overrides (BREAKING) (#178) · 5271625a
      Lev Zlotnik authored
      * Bias handling:
        * Add 'bits_bias' parameter to explicitly specify # of bits for bias,
          similar to weights and activations.
        * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter
      * Custom overrides:
        * Expand the semantics of the overrides dict to allow overriding of
          other parameters in addition to bit-widths
        * Functions registered in the quantizer's 'replacement_factory' can
          define keyword arguments. Non bit-width entries in the overrides
          dict will be checked against the function signature and passed
        * BREAKING:
          * Changed the name of 'bits_overrides' to simply 'overrides'
          * Bit-width overrides must now be defined using the full parameter
            names - 'bits_activations/weights/bias' instead of the short-hands
            'acts' and 'wts' which were used so far.
        * Added/updated relevant tests
        * Modified all quantization YAMLs under 'examples' to reflect 
          these changes
        * Updated docs
  12. Mar 29, 2019
  13. Feb 26, 2019
  14. Feb 11, 2019
    • Guy Jacob's avatar
      Post-train quant based on stats + additional modules quantized (#136) · 28a8ee18
      Guy Jacob authored
      Summary of changes:
      (1) Post-train quantization based on pre-collected statistics
      (2) Quantized concat, element-wise addition / multiplication and embeddings
      (3) Move post-train quantization command line args out of sample code
      (4) Configure post-train quantization from YAML for more fine-grained control
      (See PR #136 for more detailed changes descriptions)
  15. Dec 11, 2018
  16. Dec 09, 2018
  17. Dec 06, 2018
    • Neta Zmora's avatar
      Documentation refactoring · 178c8c49
      Neta Zmora authored
      - Moved the Language model and struct pruning tutorials from the Wiki to
      the HTML documentation.  Love the ease of Wiki, but GitHub doesn't let
      Google crawl these pages, and users can't open PRs on Wiki pages.
      - Updated the pruning algorithms documentation
  18. Dec 04, 2018
    • Guy Jacob's avatar
      Range-Based Linear Quantization Features (#95) · 907a6f04
      Guy Jacob authored
      * Asymmetric post-training quantization (only symmetric supported so until now)
      * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
      * Per-channel quantization support in both training and post-training
      * Added tests and examples
      * Updated documentation
  19. Nov 25, 2018
  20. Nov 24, 2018
    • Neta Zmora's avatar
      Fix activation stats for Linear layers · 22e3ea8b
      Neta Zmora authored
      Thanks to Dan Alistarh for bringing this issue to my attention.
      The activations of Linear layers have shape (batch_size, output_size) and those
      of Convolution layers have shape (batch_size, num_channels, width, height) and
      this distinction in shape was not correctly handled.
      This commit also fixes sparsity computation for very large activations, as seen
      in VGG16, which leads to memory exhaustion.  One solution is to use smaller
      batch sizes, but this commit uses a different solution, which counts zeros “manually”,
      and using less space.
      Also in this commit:
      - Added a “caveats” section to the documentation.
      - Added more tests.
  21. Nov 21, 2018
  22. Nov 08, 2018
    • Haim Barad's avatar
      Early Exit docs (#75) · 470209b9
      Haim Barad authored
      * Updated stats computation - fixes issues with validation stats
      * Clarification of output (docs)
      * Update
      * Moved validation stats to separate function
  23. Nov 07, 2018
  24. Nov 06, 2018
  25. Nov 04, 2018
  26. Oct 03, 2018
    • Neta Zmora's avatar
      documentation: update syntax of launching jupyter notebook · 5902146a
      Neta Zmora authored
      Latest versions of Jupyter notebooks have a different syntax for
      launching the server such that it listens on oll network interfaces
      (this is useful if you are running the Jupyter server on one machine,
      and connect to it from a browser on a different machine).
      	jupyter-notebook --ip=* --no-browser
      is replaced by:
      	jupyter-notebook --ip= --no-browser
  27. Sep 16, 2018
    • Neta Zmora's avatar
      A temporary fix for issue #36 (#48) · 5d3d6d8d
      Neta Zmora authored
      * A temporary fix for issue 36
      The thinning code assumes that the sgraph it is using
      is not data-parallel, because it (currently) accesses the
      layer-name keys using a "normalized" name ("module." is removed).
      The bug is that in we create a data_parallel=True
      model; and then give it to sgraph.
      But in other places thinning code uses "normalized" keys.  For
      example in
      The temporary fix configures data_parallel=False in
      A long term solution should have SummaryGraph know how to handle
      both parallel and not-parallel models.  This can be done by having
      SummaryGraph convert layer-names it receives in the API to
      data_parallel=False using normalize_layer_name.  When returning
      results, use the de-normalized format.
      * Fix the documentation error from issue 36
      * Move some logs to debug and show in logging.conf how to enable DEBUG logs.
  28. Sep 03, 2018
  29. Jul 31, 2018
  30. Jul 22, 2018
    • Gal Novik's avatar
      PACT quantizer (#30) · df9a00ce
      Gal Novik authored
      * Adding PACT quantization method
      * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
      * Updated documentation and tests
  31. Jul 17, 2018
    • Guy Jacob's avatar
      Quantizer tests, fixes and docs update · 6b166cec
      Guy Jacob authored
      * Add Quantizer unit tests
      * Require 'bits_overrides' to be OrderedDict to support overlapping
        patterns in a predictable manner + update documentation to reflect this
      * Quantizer class cleanup
        * Use "public" nn.Module APIs instead of protected attributes
        * Call the builtins set/get/delattr instead of the class special methods
        * Fix issues reported in #24
      * Bug in RangeLinearQuantParamLayerWrapper - add explicit override of
        pre_quantized_forward accpeting single input (#15)
      * Add DoReFa test to full_flow_tests
  32. Jul 01, 2018
  33. Jun 22, 2018
    • Thomas Fan's avatar
      DOC: Fix (#10) · ab35fed7
      Thomas Fan authored
      Reviewed and looking good.  We have to set a convention for naming files.
  34. Jun 21, 2018
  35. Jun 14, 2018