Skip to content
Snippets Groups Projects
  1. Apr 28, 2020
  2. Apr 27, 2020
  3. Apr 22, 2020
  4. Apr 21, 2020
  5. Apr 20, 2020
    • Neta Zmora's avatar
      Add example code showing schedule specification using code. · 5b01a40c
      Neta Zmora authored
      This script shows how to specify a compression-schedule directly
      using Distiller's API, instead of using a YAML specification
      
      examples/scheduling_api/direct_api_pruning.py
      5b01a40c
    • Neta Zmora's avatar
      small tensor masking API refactoring (#499) · 68514d17
      Neta Zmora authored
      Added masking primitives:
       -mask_tensor
       -create_mask_threshold_criterion
       -create_mask_level_criterion
       -create_mask_sensitivity_criterion
      
       These APIs have a clearer name and communicate their
       responsibility better: create a tensor mask, based on
       some criterion.  Previously,
       distiller.pruning.create_mask_threshold_criterion was
       named distiller.threshold_mask which did not communicate
       well what this function did.
       Masking functionality is no longer hidden
       inside the Pruner instances, so they can be used directly
       by an application, or to compose new Pruner classes.
      
      Removed file distiller.pruning.pruner:
       -The base-class _ParameterPruner is useless and adds
       needless details to the implementation.
      
      AGP: Separated the pruning-rate schedule from the
       rest of the logic.  This allows us to mix-and-match different
       pruning-rate schedules (just like LR schedulers).
      68514d17
  6. Apr 16, 2020
  7. Apr 13, 2020
  8. Apr 12, 2020
  9. Mar 31, 2020
  10. Feb 26, 2020
  11. Feb 23, 2020
  12. Feb 17, 2020
    • Guy Jacob's avatar
    • Guy Jacob's avatar
      PyTorch PTQ convert updates/fixes + Raw activations collector · ccd11ddb
      Guy Jacob authored
      * BUGFIX: Fixed wrong attribute name for zero-point in conversion
        of eltwise add/mult and concat
      * Add PyTorch PTQ convert for embedding (converted to FP32
        embedding + quant op)
      * Fix conversion function to work with tuple/list model inputs
      ccd11ddb
    • Guy Jacob's avatar
      Post-Train Quant LAPQ Refactoring (#473) · 394e3bc6
      Guy Jacob authored
      * Move image classification specific setup code to separate script at
        examples/classifier_compression/ptq_lapq.py
      * Make ptq_coordinate_search function completely independent of
        command line arguments
      * Change LAPQ command line args function to update existing
        pre-existing parser (changed CLAs perfix to 'lapq' for more clarity)
      * Enable LAPQ from compress_classifier.py (trigger with --qe-lapq)
      * Add pointers in documentation
      394e3bc6
  13. Feb 13, 2020
  14. Feb 09, 2020
  15. Feb 06, 2020
    • Guy Jacob's avatar
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458) · cdc1775f
      Guy Jacob authored
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458)
      
      * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch()
      * Can also be called from PostTrainLinearQuantizer instance:
          quantizer.convert_to_pytorch()
      * Can also trigger from command line in image classification sample
      * Can save/load converted modules via apputils.load/save_checkpoint
      * Added Jupyter notebook tutorial
      
      * Converted modules have only the absolutely necessary quant-dequant
        operations. For a fully quantized model, this means just quantization
        of model input and de-quantization of model output. If a user keeps
        specific internal layers in FP32, quant-dequant operations are added
        as needed
      * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we
        take care of preventing overflows (aka "reduce_range" in the PyTorch
        API)
      cdc1775f
  16. Feb 03, 2020
  17. Feb 02, 2020
    • Lev Zlotnik's avatar
      Loss Aware Post Train Quantization search (#432) · 0b493fd3
      Lev Zlotnik authored
      "Loss Aware Post-Training Quantization" (Nahshan et al., 2019)
      
      Paper: https://arxiv.org/abs/1911.07190 
      Reference implementation:
        https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
      
      Proper documentation is still TODO, for now see the example YAML file
      at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml'
      
      * Implemented in distiller/quantization/ptq_coordinate_search.py
      * At the moment that file both the model-independent algorithm
        implementation and image-classification specific sample script.
        Still TODO: Refactor that
      
      * Post train quantization changes (range_linear):
        * Added getters/setters for quantization parameters (scale/zero_point)
          and clipping values
        * Add option to save backup of FP32 weights to allow re-quantization
          after quantizer was created.
        * Add option to clip weights in addition to activations
        * Fix fusions to not occur only when activations aren't quantized
        * RangeLinearFakeQuantWrapper:
          * Make inputs quantization optional
          * In case of ReLU + ACIQ, clip according to input stats
      
      * Data loaders:
        * Add option to not load train set at all from disk (to speed up
          loading time in post-training runs)
        * Modified "image_classifier.py" accordingly
      0b493fd3
  18. Jan 19, 2020
  19. Jan 18, 2020
  20. Jan 15, 2020
    • Guy Jacob's avatar
      Fix scale factor calculation in symmetric quantization (#463) · 78255ee0
      Guy Jacob authored
      (we use 8-bit values below, but this applies to any bit-width)
      * We use the notion of "full" and "restricted" quantized range for
        symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
      * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
      * Until now, when doing symmetric quantization we assumed a "full"
        range when saturating after quantization, but calculated the scale
        factor as if the range was restricted. This means we weren't making
        full utilization of the quantized range.
      * On the other hand, in some other implementations of quantization (e.g.
        TensorFlow), the "restricted" range is used.
      * So, we make it an option to use either the proper "full" range
        (q_min = -128) or "restricted" range (q_min = -127).
      * LinearQuantMode.SYMMETRIC now means the "full" range is used, and
        added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
        range.
      * Updated tests and documentation.
      78255ee0
  21. Jan 06, 2020
Loading