Skip to content
Snippets Groups Projects
  1. Feb 17, 2020
    • Guy Jacob's avatar
    • Guy Jacob's avatar
      PyTorch PTQ convert updates/fixes + Raw activations collector · ccd11ddb
      Guy Jacob authored
      * BUGFIX: Fixed wrong attribute name for zero-point in conversion
        of eltwise add/mult and concat
      * Add PyTorch PTQ convert for embedding (converted to FP32
        embedding + quant op)
      * Fix conversion function to work with tuple/list model inputs
      ccd11ddb
    • Guy Jacob's avatar
      Post-Train Quant LAPQ Refactoring (#473) · 394e3bc6
      Guy Jacob authored
      * Move image classification specific setup code to separate script at
        examples/classifier_compression/ptq_lapq.py
      * Make ptq_coordinate_search function completely independent of
        command line arguments
      * Change LAPQ command line args function to update existing
        pre-existing parser (changed CLAs perfix to 'lapq' for more clarity)
      * Enable LAPQ from compress_classifier.py (trigger with --qe-lapq)
      * Add pointers in documentation
      394e3bc6
  2. Feb 13, 2020
  3. Feb 09, 2020
  4. Feb 06, 2020
    • Guy Jacob's avatar
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458) · cdc1775f
      Guy Jacob authored
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458)
      
      * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch()
      * Can also be called from PostTrainLinearQuantizer instance:
          quantizer.convert_to_pytorch()
      * Can also trigger from command line in image classification sample
      * Can save/load converted modules via apputils.load/save_checkpoint
      * Added Jupyter notebook tutorial
      
      * Converted modules have only the absolutely necessary quant-dequant
        operations. For a fully quantized model, this means just quantization
        of model input and de-quantization of model output. If a user keeps
        specific internal layers in FP32, quant-dequant operations are added
        as needed
      * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we
        take care of preventing overflows (aka "reduce_range" in the PyTorch
        API)
      cdc1775f
  5. Feb 03, 2020
  6. Feb 02, 2020
    • Lev Zlotnik's avatar
      Loss Aware Post Train Quantization search (#432) · 0b493fd3
      Lev Zlotnik authored
      "Loss Aware Post-Training Quantization" (Nahshan et al., 2019)
      
      Paper: https://arxiv.org/abs/1911.07190 
      Reference implementation:
        https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
      
      Proper documentation is still TODO, for now see the example YAML file
      at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml'
      
      * Implemented in distiller/quantization/ptq_coordinate_search.py
      * At the moment that file both the model-independent algorithm
        implementation and image-classification specific sample script.
        Still TODO: Refactor that
      
      * Post train quantization changes (range_linear):
        * Added getters/setters for quantization parameters (scale/zero_point)
          and clipping values
        * Add option to save backup of FP32 weights to allow re-quantization
          after quantizer was created.
        * Add option to clip weights in addition to activations
        * Fix fusions to not occur only when activations aren't quantized
        * RangeLinearFakeQuantWrapper:
          * Make inputs quantization optional
          * In case of ReLU + ACIQ, clip according to input stats
      
      * Data loaders:
        * Add option to not load train set at all from disk (to speed up
          loading time in post-training runs)
        * Modified "image_classifier.py" accordingly
      0b493fd3
  7. Jan 19, 2020
  8. Jan 18, 2020
  9. Jan 15, 2020
    • Guy Jacob's avatar
      Fix scale factor calculation in symmetric quantization (#463) · 78255ee0
      Guy Jacob authored
      (we use 8-bit values below, but this applies to any bit-width)
      * We use the notion of "full" and "restricted" quantized range for
        symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
      * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
      * Until now, when doing symmetric quantization we assumed a "full"
        range when saturating after quantization, but calculated the scale
        factor as if the range was restricted. This means we weren't making
        full utilization of the quantized range.
      * On the other hand, in some other implementations of quantization (e.g.
        TensorFlow), the "restricted" range is used.
      * So, we make it an option to use either the proper "full" range
        (q_min = -128) or "restricted" range (q_min = -127).
      * LinearQuantMode.SYMMETRIC now means the "full" range is used, and
        added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
        range.
      * Updated tests and documentation.
      78255ee0
  10. Jan 06, 2020
  11. Dec 30, 2019
  12. Dec 29, 2019
  13. Dec 26, 2019
  14. Dec 18, 2019
    • Bar's avatar
      IFM sparsity collector (#443) · cc50035e
      Bar authored
      Add directionality to SummaryActivationStatsCollector to allow collection of statistics on incoming and outgoing activations/feature-maps; instead of just outgoing activations.
      
      Also includes some code refactoring.
      cc50035e
  15. Dec 12, 2019
  16. Dec 11, 2019
  17. Dec 09, 2019
  18. Dec 08, 2019
    • Guy Jacob's avatar
      Enable weights/activations-only PTQ for conv/linear modules (#439) · 952028d0
      Guy Jacob authored
      * Weights-only PTQ:
        * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in
          which case it'll act as a simple pass-through during forward
        * In RangeLinearQuantParamLayerWrapper, if bits_activations is None
          and num_bits_params > 0, Perform quant and de-quant of the
          parameters instead of just quant.
      * Activations-only PTQ:
        * Enable activations only quantization for conv/linear modules. When
          PostTrainLinearQuantizer detects # bits != None for activations 
          and # bits == None for weights, a fake-quantization wrapper will
          be used.
      * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command
        line arguments to invoke weights/activations-only quantization,
        respectively.
      * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_*
        functions
      952028d0
    • Guy Jacob's avatar
      Update PTQ ResNet-50 command line results · 326d172f
      Guy Jacob authored
      Results changed following commit 9e7ef987 (#402)
      326d172f
  19. Dec 03, 2019
  20. Dec 02, 2019
  21. Nov 28, 2019
  22. Nov 27, 2019
  23. Nov 25, 2019
    • Neta Zmora's avatar
      Resnet50 early-exit update · 8b341593
      Neta Zmora authored
      Update the definition of the exits using info from Haim.
      
      This is still very unsatsifactory because we don't have working
      examples to show users :-(
      8b341593
  24. Nov 17, 2019
  25. Nov 16, 2019
Loading