Skip to content
Snippets Groups Projects
  1. Oct 27, 2020
  2. May 11, 2020
  3. Apr 30, 2020
  4. Apr 20, 2020
  5. Feb 17, 2020
    • Guy Jacob's avatar
    • Guy Jacob's avatar
      Post-Train Quant LAPQ Refactoring (#473) · 394e3bc6
      Guy Jacob authored
      * Move image classification specific setup code to separate script at
        examples/classifier_compression/ptq_lapq.py
      * Make ptq_coordinate_search function completely independent of
        command line arguments
      * Change LAPQ command line args function to update existing
        pre-existing parser (changed CLAs perfix to 'lapq' for more clarity)
      * Enable LAPQ from compress_classifier.py (trigger with --qe-lapq)
      * Add pointers in documentation
      394e3bc6
  6. Feb 06, 2020
    • Guy Jacob's avatar
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458) · cdc1775f
      Guy Jacob authored
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458)
      
      * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch()
      * Can also be called from PostTrainLinearQuantizer instance:
          quantizer.convert_to_pytorch()
      * Can also trigger from command line in image classification sample
      * Can save/load converted modules via apputils.load/save_checkpoint
      * Added Jupyter notebook tutorial
      
      * Converted modules have only the absolutely necessary quant-dequant
        operations. For a fully quantized model, this means just quantization
        of model input and de-quantization of model output. If a user keeps
        specific internal layers in FP32, quant-dequant operations are added
        as needed
      * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we
        take care of preventing overflows (aka "reduce_range" in the PyTorch
        API)
      cdc1775f
  7. Feb 03, 2020
  8. Feb 02, 2020
    • Lev Zlotnik's avatar
      Loss Aware Post Train Quantization search (#432) · 0b493fd3
      Lev Zlotnik authored
      "Loss Aware Post-Training Quantization" (Nahshan et al., 2019)
      
      Paper: https://arxiv.org/abs/1911.07190 
      Reference implementation:
        https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
      
      Proper documentation is still TODO, for now see the example YAML file
      at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml'
      
      * Implemented in distiller/quantization/ptq_coordinate_search.py
      * At the moment that file both the model-independent algorithm
        implementation and image-classification specific sample script.
        Still TODO: Refactor that
      
      * Post train quantization changes (range_linear):
        * Added getters/setters for quantization parameters (scale/zero_point)
          and clipping values
        * Add option to save backup of FP32 weights to allow re-quantization
          after quantizer was created.
        * Add option to clip weights in addition to activations
        * Fix fusions to not occur only when activations aren't quantized
        * RangeLinearFakeQuantWrapper:
          * Make inputs quantization optional
          * In case of ReLU + ACIQ, clip according to input stats
      
      * Data loaders:
        * Add option to not load train set at all from disk (to speed up
          loading time in post-training runs)
        * Modified "image_classifier.py" accordingly
      0b493fd3
  9. Jan 15, 2020
    • Guy Jacob's avatar
      Fix scale factor calculation in symmetric quantization (#463) · 78255ee0
      Guy Jacob authored
      (we use 8-bit values below, but this applies to any bit-width)
      * We use the notion of "full" and "restricted" quantized range for
        symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
      * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
      * Until now, when doing symmetric quantization we assumed a "full"
        range when saturating after quantization, but calculated the scale
        factor as if the range was restricted. This means we weren't making
        full utilization of the quantized range.
      * On the other hand, in some other implementations of quantization (e.g.
        TensorFlow), the "restricted" range is used.
      * So, we make it an option to use either the proper "full" range
        (q_min = -128) or "restricted" range (q_min = -127).
      * LinearQuantMode.SYMMETRIC now means the "full" range is used, and
        added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
        range.
      * Updated tests and documentation.
      78255ee0
  10. Dec 30, 2019
  11. Dec 29, 2019
  12. Dec 26, 2019
  13. Dec 09, 2019
  14. Dec 08, 2019
    • Guy Jacob's avatar
      Enable weights/activations-only PTQ for conv/linear modules (#439) · 952028d0
      Guy Jacob authored
      * Weights-only PTQ:
        * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in
          which case it'll act as a simple pass-through during forward
        * In RangeLinearQuantParamLayerWrapper, if bits_activations is None
          and num_bits_params > 0, Perform quant and de-quant of the
          parameters instead of just quant.
      * Activations-only PTQ:
        * Enable activations only quantization for conv/linear modules. When
          PostTrainLinearQuantizer detects # bits != None for activations 
          and # bits == None for weights, a fake-quantization wrapper will
          be used.
      * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command
        line arguments to invoke weights/activations-only quantization,
        respectively.
      * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_*
        functions
      952028d0
    • Guy Jacob's avatar
      Update PTQ ResNet-50 command line results · 326d172f
      Guy Jacob authored
      Results changed following commit 9e7ef987 (#402)
      326d172f
  15. Dec 03, 2019
  16. Dec 02, 2019
  17. Nov 28, 2019
  18. Nov 17, 2019
  19. Nov 16, 2019
  20. Nov 13, 2019
    • Bar's avatar
      image_classifier.py: PTQ stats collection and eval in same run (#346) · fb98377e
      Bar authored
      * Previous implementation:
        * Stats collection required a separate run with `-qe-calibration`.
        * Specifying `--quantize-eval` without `--qe-stats-file` triggered
          dynamic quantization.
        * Running with `--quantize-eval --qe-calibration <num>` only ran
          stats collection and ignored --quantize-eval.
      
      * New implementation:
        * Running `--quantize-eval --qe-calibration <num>` will now 
          perform stats collection according to the calibration flag,
          and then quantize the model with the collected stats (and
          run evaluation).
        * Specifying `--quantize-eval` without `--qe-stats-file` will
          trigger the same flow as in the bullet above, as if 
          `--qe-calibration 0.05` was used (i.e. 5% of the test set will
          be used for stats).
        * Added new flag: `--qe-dynamic`. From now, to do dynamic 
          quantization, need to explicitly run:
          `--quantize-eval --qe-dynamic`
        * As before, can still run `--qe-calibration` without 
          `--quantize-eval` to perform "stand-alone" stats collection
        * The following flags, which all represent different ways to
          control creation of stats or use of existing stats, are now
          mutually exclusive:
          `--qe-calibration`, `-qe-stats-file`, `--qe-dynamic`,
          `--qe-config-file`
      fb98377e
  21. Nov 11, 2019
    • Neta Zmora's avatar
      Pruning with virtual Batch-norm statistics folding (#415) · c849a25f
      Neta Zmora authored
      * pruning: add an option to virtually fold BN into Conv2D for ranking
      
      PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
      This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
      e.g.:
      policies:
        - pruner:
            instance_name : low_pruner
            args:
              fold_batchnorm: True
          starting_epoch: 0
          ending_epoch: 30
          frequency: 2
      
      * AGP: non-functional refactoring
      
      distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
      to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
      name as we don’t really prune in this function
      
      * Simplify GEMM weights input-channel ranking logic
      
      Ranking weight-matrices by input channels is similar to ranking 4D
      Conv weights by input channels, so there is no need for duplicate logic.
      
      distiller/pruning/ranked_structures_pruner.py
      -change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
      which is a more appropriate function name as we don’t really prune in this
      function
      -remove the code handling ranking of matrix rows
      
      distiller/norms.py – remove rank_cols.
      
      distiller/thresholding.py – in expand_binary_map treat `channels` group_type
      the same as the `cols` group_type when dealing with 2D weights
      
      * AGP: add example of ranking filters with virtual BN-folding
      
      Also update resnet20 AGP examples
      c849a25f
  22. Nov 07, 2019
    • Neta Zmora's avatar
      Fix Early-exit code · fc62caab
      Neta Zmora authored
      Fix the EE code so that it works with the current 'master' branch,
      and add a test for high-level EE regression
      fc62caab
  23. Nov 06, 2019
  24. Oct 31, 2019
  25. Oct 23, 2019
  26. Oct 07, 2019
    • Guy Jacob's avatar
      Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987
      Guy Jacob authored
      
      * Greedy search script for post-training quantization settings
        * Iterates over each layer in the model in order. For each layer,
          checks a user-defined set of quantization settings and chooses
          the best one based on validation accuracy
        * Provided sample that searches for best activations-clipping
          mode per layer, on image classification models
      
      * Proper handling of mixed-quantization settings in post-train quant:
        * By default, the quantization settings for each layer apply only
          to output quantization
        * Propagate quantization settings for activations tensors through
          the model during execution
        * For non-quantized inputs to layers that require quantized inputs,
          fall-back to quantizing according to the settings used for the
          output
        * In addition, provide mechanism to override inputs quantization
          settings via the YAML configuration file
        * By default all modules are quantized now. For module types that
          don't have a dedicated quantized implementation, "fake"
          quantization is performed
      
      * Misc. Changes
        * Fuse ReLU/ReLU6 to predecessor during post-training quantization
        * Fixes to ACIQ clipping in the half-range case
      
      Co-authored-by: default avatarLev Zlotnik <lev.zlotnik@intel.com>
      Co-authored-by: default avatarGuy Jacob <guy.jacob@intel.com>
      9e7ef987
    • Neta Zmora's avatar
Loading