Skip to content
Snippets Groups Projects
  1. Apr 30, 2020
    • Guy Jacob's avatar
      Knowledge distillation fixes (#503) · 32a7e4bf
      Guy Jacob authored
      Fixed two long-standing bugs in knowledge distillation:
       * Distillation loss needs to be scaled by T^2 (#122)
       * Use tensor.clone instead of new_tensor when caching student logits (#234)
      Updated example results and uploaded the script to generate them
      Unverified
      32a7e4bf
  2. Apr 20, 2020
  3. Feb 17, 2020
  4. Feb 06, 2020
    • Guy Jacob's avatar
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458) · cdc1775f
      Guy Jacob authored
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458)
      
      * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch()
      * Can also be called from PostTrainLinearQuantizer instance:
          quantizer.convert_to_pytorch()
      * Can also trigger from command line in image classification sample
      * Can save/load converted modules via apputils.load/save_checkpoint
      * Added Jupyter notebook tutorial
      
      * Converted modules have only the absolutely necessary quant-dequant
        operations. For a fully quantized model, this means just quantization
        of model input and de-quantization of model output. If a user keeps
        specific internal layers in FP32, quant-dequant operations are added
        as needed
      * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we
        take care of preventing overflows (aka "reduce_range" in the PyTorch
        API)
      Unverified
      cdc1775f
  5. Feb 03, 2020
  6. Feb 02, 2020
    • Lev Zlotnik's avatar
      Loss Aware Post Train Quantization search (#432) · 0b493fd3
      Lev Zlotnik authored
      "Loss Aware Post-Training Quantization" (Nahshan et al., 2019)
      
      Paper: https://arxiv.org/abs/1911.07190 
      Reference implementation:
        https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
      
      Proper documentation is still TODO, for now see the example YAML file
      at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml'
      
      * Implemented in distiller/quantization/ptq_coordinate_search.py
      * At the moment that file both the model-independent algorithm
        implementation and image-classification specific sample script.
        Still TODO: Refactor that
      
      * Post train quantization changes (range_linear):
        * Added getters/setters for quantization parameters (scale/zero_point)
          and clipping values
        * Add option to save backup of FP32 weights to allow re-quantization
          after quantizer was created.
        * Add option to clip weights in addition to activations
        * Fix fusions to not occur only when activations aren't quantized
        * RangeLinearFakeQuantWrapper:
          * Make inputs quantization optional
          * In case of ReLU + ACIQ, clip according to input stats
      
      * Data loaders:
        * Add option to not load train set at all from disk (to speed up
          loading time in post-training runs)
        * Modified "image_classifier.py" accordingly
      Unverified
      0b493fd3
  7. Jan 15, 2020
    • Guy Jacob's avatar
      Fix scale factor calculation in symmetric quantization (#463) · 78255ee0
      Guy Jacob authored
      (we use 8-bit values below, but this applies to any bit-width)
      * We use the notion of "full" and "restricted" quantized range for
        symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
      * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
      * Until now, when doing symmetric quantization we assumed a "full"
        range when saturating after quantization, but calculated the scale
        factor as if the range was restricted. This means we weren't making
        full utilization of the quantized range.
      * On the other hand, in some other implementations of quantization (e.g.
        TensorFlow), the "restricted" range is used.
      * So, we make it an option to use either the proper "full" range
        (q_min = -128) or "restricted" range (q_min = -127).
      * LinearQuantMode.SYMMETRIC now means the "full" range is used, and
        added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
        range.
      * Updated tests and documentation.
      Unverified
      78255ee0
  8. Dec 30, 2019
  9. Dec 29, 2019
  10. Dec 26, 2019
  11. Dec 09, 2019
  12. Dec 08, 2019
    • Guy Jacob's avatar
      Enable weights/activations-only PTQ for conv/linear modules (#439) · 952028d0
      Guy Jacob authored
      * Weights-only PTQ:
        * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in
          which case it'll act as a simple pass-through during forward
        * In RangeLinearQuantParamLayerWrapper, if bits_activations is None
          and num_bits_params > 0, Perform quant and de-quant of the
          parameters instead of just quant.
      * Activations-only PTQ:
        * Enable activations only quantization for conv/linear modules. When
          PostTrainLinearQuantizer detects # bits != None for activations 
          and # bits == None for weights, a fake-quantization wrapper will
          be used.
      * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command
        line arguments to invoke weights/activations-only quantization,
        respectively.
      * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_*
        functions
      Unverified
      952028d0
    • Guy Jacob's avatar
      Update PTQ ResNet-50 command line results · 326d172f
      Guy Jacob authored
      Results changed following commit 9e7ef987 (#402)
      Unverified
      326d172f
  13. Dec 03, 2019
  14. Dec 02, 2019
  15. Nov 28, 2019
  16. Nov 17, 2019
  17. Nov 16, 2019
  18. Nov 13, 2019
    • Bar's avatar
      image_classifier.py: PTQ stats collection and eval in same run (#346) · fb98377e
      Bar authored
      * Previous implementation:
        * Stats collection required a separate run with `-qe-calibration`.
        * Specifying `--quantize-eval` without `--qe-stats-file` triggered
          dynamic quantization.
        * Running with `--quantize-eval --qe-calibration <num>` only ran
          stats collection and ignored --quantize-eval.
      
      * New implementation:
        * Running `--quantize-eval --qe-calibration <num>` will now 
          perform stats collection according to the calibration flag,
          and then quantize the model with the collected stats (and
          run evaluation).
        * Specifying `--quantize-eval` without `--qe-stats-file` will
          trigger the same flow as in the bullet above, as if 
          `--qe-calibration 0.05` was used (i.e. 5% of the test set will
          be used for stats).
        * Added new flag: `--qe-dynamic`. From now, to do dynamic 
          quantization, need to explicitly run:
          `--quantize-eval --qe-dynamic`
        * As before, can still run `--qe-calibration` without 
          `--quantize-eval` to perform "stand-alone" stats collection
        * The following flags, which all represent different ways to
          control creation of stats or use of existing stats, are now
          mutually exclusive:
          `--qe-calibration`, `-qe-stats-file`, `--qe-dynamic`,
          `--qe-config-file`
      fb98377e
  19. Nov 11, 2019
    • Neta Zmora's avatar
      Pruning with virtual Batch-norm statistics folding (#415) · c849a25f
      Neta Zmora authored
      * pruning: add an option to virtually fold BN into Conv2D for ranking
      
      PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
      This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
      e.g.:
      policies:
        - pruner:
            instance_name : low_pruner
            args:
              fold_batchnorm: True
          starting_epoch: 0
          ending_epoch: 30
          frequency: 2
      
      * AGP: non-functional refactoring
      
      distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
      to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
      name as we don’t really prune in this function
      
      * Simplify GEMM weights input-channel ranking logic
      
      Ranking weight-matrices by input channels is similar to ranking 4D
      Conv weights by input channels, so there is no need for duplicate logic.
      
      distiller/pruning/ranked_structures_pruner.py
      -change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
      which is a more appropriate function name as we don’t really prune in this
      function
      -remove the code handling ranking of matrix rows
      
      distiller/norms.py – remove rank_cols.
      
      distiller/thresholding.py – in expand_binary_map treat `channels` group_type
      the same as the `cols` group_type when dealing with 2D weights
      
      * AGP: add example of ranking filters with virtual BN-folding
      
      Also update resnet20 AGP examples
      Unverified
      c849a25f
  20. Nov 07, 2019
    • Neta Zmora's avatar
      Fix Early-exit code · fc62caab
      Neta Zmora authored
      Fix the EE code so that it works with the current 'master' branch,
      and add a test for high-level EE regression
      fc62caab
  21. Nov 06, 2019
  22. Oct 31, 2019
  23. Oct 23, 2019
  24. Oct 07, 2019
    • Guy Jacob's avatar
      Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987
      Guy Jacob authored
      
      * Greedy search script for post-training quantization settings
        * Iterates over each layer in the model in order. For each layer,
          checks a user-defined set of quantization settings and chooses
          the best one based on validation accuracy
        * Provided sample that searches for best activations-clipping
          mode per layer, on image classification models
      
      * Proper handling of mixed-quantization settings in post-train quant:
        * By default, the quantization settings for each layer apply only
          to output quantization
        * Propagate quantization settings for activations tensors through
          the model during execution
        * For non-quantized inputs to layers that require quantized inputs,
          fall-back to quantizing according to the settings used for the
          output
        * In addition, provide mechanism to override inputs quantization
          settings via the YAML configuration file
        * By default all modules are quantized now. For module types that
          don't have a dedicated quantized implementation, "fake"
          quantization is performed
      
      * Misc. Changes
        * Fuse ReLU/ReLU6 to predecessor during post-training quantization
        * Fixes to ACIQ clipping in the half-range case
      
      Co-authored-by: default avatarLev Zlotnik <lev.zlotnik@intel.com>
      Co-authored-by: default avatarGuy Jacob <guy.jacob@intel.com>
      Unverified
      9e7ef987
    • Neta Zmora's avatar
  25. Oct 06, 2019
    • Neta Zmora's avatar
      Low-level pruning API refactor (#401) · 05d5592e
      Neta Zmora authored
      Some refactoring of the low-level pruning API
      
      Added distiller/norms.py - for calculating norms of various sub-tensors.
      
      ranked_structures_pruner.py:
      -Removed l1_magnitude, l2_magnitude. Use instead distiller.norms.l1_norm
      -Lots of refactoring
      -replaced LpRankedStructureParameterPruner.ch_binary_map_to_mask with
      distiller.thresholding.expand_binary_map
      -FMReconstructionChannelPruner.rank_and_prune_channels used L2-norm
      by default and now uses L1-norm (i.e.magnitude_fn=l2_magnitude was
      replaced with magnitude_fn=distiller.norms.l1_norm)
      
      thresholding.py:
      -Delegated lots of the work to the new norms.py.
      -Removed support for 4D (entire convolution layers) since that has not been
      maintained for a longtime. This may break some old scripts that remove entire
      layers.
      -added expand_binary_map() explicitly so others can use it. Might need to
      move to a different file
      -removed threshold_policy()
      
      utils.py:
      -use distiller.norms.xxx for sparsity stats
      Unverified
      05d5592e
  26. Sep 27, 2019
Loading