Skip to content
Snippets Groups Projects
  1. Oct 27, 2020
  2. Apr 27, 2020
  3. Apr 20, 2020
    • Neta Zmora's avatar
      small tensor masking API refactoring (#499) · 68514d17
      Neta Zmora authored
      Added masking primitives:
       -mask_tensor
       -create_mask_threshold_criterion
       -create_mask_level_criterion
       -create_mask_sensitivity_criterion
      
       These APIs have a clearer name and communicate their
       responsibility better: create a tensor mask, based on
       some criterion.  Previously,
       distiller.pruning.create_mask_threshold_criterion was
       named distiller.threshold_mask which did not communicate
       well what this function did.
       Masking functionality is no longer hidden
       inside the Pruner instances, so they can be used directly
       by an application, or to compose new Pruner classes.
      
      Removed file distiller.pruning.pruner:
       -The base-class _ParameterPruner is useless and adds
       needless details to the implementation.
      
      AGP: Separated the pruning-rate schedule from the
       rest of the logic.  This allows us to mix-and-match different
       pruning-rate schedules (just like LR schedulers).
      Unverified
      68514d17
  4. Feb 06, 2020
    • Guy Jacob's avatar
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458) · cdc1775f
      Guy Jacob authored
      Convert Distiller PTQ models to "native" PyTorch PTQ (#458)
      
      * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch()
      * Can also be called from PostTrainLinearQuantizer instance:
          quantizer.convert_to_pytorch()
      * Can also trigger from command line in image classification sample
      * Can save/load converted modules via apputils.load/save_checkpoint
      * Added Jupyter notebook tutorial
      
      * Converted modules have only the absolutely necessary quant-dequant
        operations. For a fully quantized model, this means just quantization
        of model input and de-quantization of model output. If a user keeps
        specific internal layers in FP32, quant-dequant operations are added
        as needed
      * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we
        take care of preventing overflows (aka "reduce_range" in the PyTorch
        API)
      Unverified
      cdc1775f
  5. Feb 02, 2020
    • Lev Zlotnik's avatar
      Loss Aware Post Train Quantization search (#432) · 0b493fd3
      Lev Zlotnik authored
      "Loss Aware Post-Training Quantization" (Nahshan et al., 2019)
      
      Paper: https://arxiv.org/abs/1911.07190 
      Reference implementation:
        https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
      
      Proper documentation is still TODO, for now see the example YAML file
      at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml'
      
      * Implemented in distiller/quantization/ptq_coordinate_search.py
      * At the moment that file both the model-independent algorithm
        implementation and image-classification specific sample script.
        Still TODO: Refactor that
      
      * Post train quantization changes (range_linear):
        * Added getters/setters for quantization parameters (scale/zero_point)
          and clipping values
        * Add option to save backup of FP32 weights to allow re-quantization
          after quantizer was created.
        * Add option to clip weights in addition to activations
        * Fix fusions to not occur only when activations aren't quantized
        * RangeLinearFakeQuantWrapper:
          * Make inputs quantization optional
          * In case of ReLU + ACIQ, clip according to input stats
      
      * Data loaders:
        * Add option to not load train set at all from disk (to speed up
          loading time in post-training runs)
        * Modified "image_classifier.py" accordingly
      Unverified
      0b493fd3
  6. Jan 15, 2020
    • Guy Jacob's avatar
      Fix scale factor calculation in symmetric quantization (#463) · 78255ee0
      Guy Jacob authored
      (we use 8-bit values below, but this applies to any bit-width)
      * We use the notion of "full" and "restricted" quantized range for
        symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
      * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
      * Until now, when doing symmetric quantization we assumed a "full"
        range when saturating after quantization, but calculated the scale
        factor as if the range was restricted. This means we weren't making
        full utilization of the quantized range.
      * On the other hand, in some other implementations of quantization (e.g.
        TensorFlow), the "restricted" range is used.
      * So, we make it an option to use either the proper "full" range
        (q_min = -128) or "restricted" range (q_min = -127).
      * LinearQuantMode.SYMMETRIC now means the "full" range is used, and
        added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
        range.
      * Updated tests and documentation.
      Unverified
      78255ee0
  7. Dec 12, 2019
  8. Dec 08, 2019
    • Guy Jacob's avatar
      Enable weights/activations-only PTQ for conv/linear modules (#439) · 952028d0
      Guy Jacob authored
      * Weights-only PTQ:
        * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in
          which case it'll act as a simple pass-through during forward
        * In RangeLinearQuantParamLayerWrapper, if bits_activations is None
          and num_bits_params > 0, Perform quant and de-quant of the
          parameters instead of just quant.
      * Activations-only PTQ:
        * Enable activations only quantization for conv/linear modules. When
          PostTrainLinearQuantizer detects # bits != None for activations 
          and # bits == None for weights, a fake-quantization wrapper will
          be used.
      * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command
        line arguments to invoke weights/activations-only quantization,
        respectively.
      * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_*
        functions
      Unverified
      952028d0
  9. Nov 27, 2019
  10. Nov 14, 2019
    • Guy Jacob's avatar
      PyTorch 1.3 Support (#426) · b8b4cf32
      Guy Jacob authored
      * summary_graph.py:
        * Change ONNX op.uniqueName() to op.debugName()
        * Removed scope-naming workaround which isn't needed in PyTorch 1.3
      * Tests:
        * Naming of trace entries changed in 1.3. Fixed SummaryGraph unit
          test that checked that
        * Adjusted expected values in full_flow_tests
        * Adjusted tolerance in test_sim_bn_fold
        * Filter some new warnings
      Unverified
      b8b4cf32
  11. Nov 13, 2019
    • Bar's avatar
      image_classifier.py: PTQ stats collection and eval in same run (#346) · fb98377e
      Bar authored
      * Previous implementation:
        * Stats collection required a separate run with `-qe-calibration`.
        * Specifying `--quantize-eval` without `--qe-stats-file` triggered
          dynamic quantization.
        * Running with `--quantize-eval --qe-calibration <num>` only ran
          stats collection and ignored --quantize-eval.
      
      * New implementation:
        * Running `--quantize-eval --qe-calibration <num>` will now 
          perform stats collection according to the calibration flag,
          and then quantize the model with the collected stats (and
          run evaluation).
        * Specifying `--quantize-eval` without `--qe-stats-file` will
          trigger the same flow as in the bullet above, as if 
          `--qe-calibration 0.05` was used (i.e. 5% of the test set will
          be used for stats).
        * Added new flag: `--qe-dynamic`. From now, to do dynamic 
          quantization, need to explicitly run:
          `--quantize-eval --qe-dynamic`
        * As before, can still run `--qe-calibration` without 
          `--quantize-eval` to perform "stand-alone" stats collection
        * The following flags, which all represent different ways to
          control creation of stats or use of existing stats, are now
          mutually exclusive:
          `--qe-calibration`, `-qe-stats-file`, `--qe-dynamic`,
          `--qe-config-file`
      fb98377e
  12. Nov 11, 2019
    • Neta Zmora's avatar
      Pruning with virtual Batch-norm statistics folding (#415) · c849a25f
      Neta Zmora authored
      * pruning: add an option to virtually fold BN into Conv2D for ranking
      
      PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
      This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
      e.g.:
      policies:
        - pruner:
            instance_name : low_pruner
            args:
              fold_batchnorm: True
          starting_epoch: 0
          ending_epoch: 30
          frequency: 2
      
      * AGP: non-functional refactoring
      
      distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
      to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
      name as we don’t really prune in this function
      
      * Simplify GEMM weights input-channel ranking logic
      
      Ranking weight-matrices by input channels is similar to ranking 4D
      Conv weights by input channels, so there is no need for duplicate logic.
      
      distiller/pruning/ranked_structures_pruner.py
      -change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
      which is a more appropriate function name as we don’t really prune in this
      function
      -remove the code handling ranking of matrix rows
      
      distiller/norms.py – remove rank_cols.
      
      distiller/thresholding.py – in expand_binary_map treat `channels` group_type
      the same as the `cols` group_type when dealing with 2D weights
      
      * AGP: add example of ranking filters with virtual BN-folding
      
      Also update resnet20 AGP examples
      Unverified
      c849a25f
  13. Nov 10, 2019
  14. Nov 07, 2019
    • Neta Zmora's avatar
      Fix Early-exit code · fc62caab
      Neta Zmora authored
      Fix the EE code so that it works with the current 'master' branch,
      and add a test for high-level EE regression
      fc62caab
  15. Nov 06, 2019
  16. Oct 07, 2019
    • Guy Jacob's avatar
      Post-Train Quant: Greedy Search + Proper mixed-settings handling (#402) · 9e7ef987
      Guy Jacob authored
      
      * Greedy search script for post-training quantization settings
        * Iterates over each layer in the model in order. For each layer,
          checks a user-defined set of quantization settings and chooses
          the best one based on validation accuracy
        * Provided sample that searches for best activations-clipping
          mode per layer, on image classification models
      
      * Proper handling of mixed-quantization settings in post-train quant:
        * By default, the quantization settings for each layer apply only
          to output quantization
        * Propagate quantization settings for activations tensors through
          the model during execution
        * For non-quantized inputs to layers that require quantized inputs,
          fall-back to quantizing according to the settings used for the
          output
        * In addition, provide mechanism to override inputs quantization
          settings via the YAML configuration file
        * By default all modules are quantized now. For module types that
          don't have a dedicated quantized implementation, "fake"
          quantization is performed
      
      * Misc. Changes
        * Fuse ReLU/ReLU6 to predecessor during post-training quantization
        * Fixes to ACIQ clipping in the half-range case
      
      Co-authored-by: default avatarLev Zlotnik <lev.zlotnik@intel.com>
      Co-authored-by: default avatarGuy Jacob <guy.jacob@intel.com>
      Unverified
      9e7ef987
  17. Oct 06, 2019
    • Neta Zmora's avatar
      Low-level pruning API refactor (#401) · 05d5592e
      Neta Zmora authored
      Some refactoring of the low-level pruning API
      
      Added distiller/norms.py - for calculating norms of various sub-tensors.
      
      ranked_structures_pruner.py:
      -Removed l1_magnitude, l2_magnitude. Use instead distiller.norms.l1_norm
      -Lots of refactoring
      -replaced LpRankedStructureParameterPruner.ch_binary_map_to_mask with
      distiller.thresholding.expand_binary_map
      -FMReconstructionChannelPruner.rank_and_prune_channels used L2-norm
      by default and now uses L1-norm (i.e.magnitude_fn=l2_magnitude was
      replaced with magnitude_fn=distiller.norms.l1_norm)
      
      thresholding.py:
      -Delegated lots of the work to the new norms.py.
      -Removed support for 4D (entire convolution layers) since that has not been
      maintained for a longtime. This may break some old scripts that remove entire
      layers.
      -added expand_binary_map() explicitly so others can use it. Might need to
      move to a different file
      -removed threshold_policy()
      
      utils.py:
      -use distiller.norms.xxx for sparsity stats
      Unverified
      05d5592e
  18. Sep 10, 2019
  19. Sep 01, 2019
    • Neta Zmora's avatar
      AMC: add pruning of FC layers · 3f7a9408
      Neta Zmora authored
      FMReconstructionChannelPruner: add support for nn.Linear layers
      utils.py: add non_zero_channels()
      thinning: support removing channels from FC layers preceding Conv layers
      test_pruning.py: add test_row_pruning()
      scheduler: init from a dictionary of Maskers
      coach_if.py – fix imports of Clipped-PPO and TD3
      3f7a9408
  20. Aug 22, 2019
  21. Aug 21, 2019
  22. Aug 20, 2019
  23. Aug 07, 2019
  24. Aug 06, 2019
    • Neta Zmora's avatar
      AMC and other refactoring - large merge (#339) · 02054da1
      Neta Zmora authored
      *An implementation of AMC (the previous implementation
       code has moved to a new location under 
      /distiller/examples/auto_compression/amc.  AMC is aligned
      with the ‘master’ branch of Coach.
      *compress_classifier.py is refactored.  The base code moved
      to /distiller/apputils/image_classifier.py.  Further refactoring
      will follow.
      We want to provide a simple and small API to the basic features of
      a classifier-compression application.
      This will help applications that want to use the make features of a
      classifier-compression application, without the standard training
      regiment.
      AMC is one example of a stand-alone application that needs to leverage
      the capabilities of a classifier-compression application, but is currently
      coupled to `compress_classifier.py`.
      `multi-finetune.py` is another example.
      * ranked_structures_pruner.py:
      ** Added support for grouping channels/filters
      Sometimes we want to prune a group of structures: e.g. groups of
      8-channels.  This feature does not force the groups to be adjacent,
      so it is more like a set of structures.  E.g. in the case of pruning
      channels from a 64-channels convolution, grouped by 8 channels, we 
      will prune exactly one of 0/8/16/24/32/40/48/56 channels.  I.e. 
      always a multiple of 8-channels, excluding the set of all 64 channels.
      ** Added FMReconstructionChannelPruner – this is channel
      pruning using L1-magnitude to rank and select channels to
      remove, and feature-map reconstruction to improve the
      resilience to the pruning.
      * Added a script to run multiple instances of an 
      experiment, in different processes:
       examples/classifier_compression/multi-run.py
      * Set the seed value even when not specified by the command-line
      arguments, so that we can try and recreate the session.
      * Added pruning ranking noise -
      Ranking noise introduces Gaussian noise when ranking channels/filters
      using Lp-norm.  The noise is introduced using the epsilon-greedy
      methodology, where ranking using exact Lp-norm is considered greedy.
      * Added configurable rounding of pruning level: choose whether to 
      Round up/down when rounding the number of structures to prune 
      (rounding is always to an integer).  
      Unverified
      02054da1
  25. Jul 29, 2019
    • Guy Jacob's avatar
      DistillerModuleList conversion: Handle models w. duplicate modules (#338) · db531db8
      Guy Jacob authored
      * By duplicate modules we mean:
        self.relu1 = nn.Relu()
        self.relu2 = self.relu1
      * The issue:
        The second module ('relu2') will not be returned by
        torch.nn.Module.named_modules/children()
      * When converting to DistillerModuleList, in order to maintain the
        original order of modules and in order to have a correct mapping
        of names before/after the conversion - we need to take the duplicates
        into account
      * Implemented an internal version of named_modules/children that includes
        duplicates
      * Added test case for this + refactored the module list conversion tests
      Unverified
      db531db8
  26. Jul 22, 2019
    • Guy Jacob's avatar
      Fix non 1:1 mapping between model w. ModuleList and SummaryGraph (#328) · b614330c
      Guy Jacob authored
      The PyTorch trace mechanism doesn't "see" torch.nn.ModuleList modules
      (since they don't have a forward function). As a result, the mapping
      from module names at the Python model definition level to the
      scope-names at the trace level is not 1:1. This makes it impossible for
      us to map back from SummaryGraph ops to their respective nn.Modules,
      which is required for flows like BatchNorm folding and stats fusion in
      post-training quantization.
      
      In #313 we handled this issue specifically in DistillerLSTM, but it
      makes much more sense to have a generic and automatic solution for this
      issue, which doesn't require the user to modify the model. This is such
      a solution.
          
      * Implemented DistillerModuleList, a replacement for nn.ModuleList
        which results in full and unique scope-names
      * See documentation for this class in summary_graph.py for extensive
        details on the issue and solution
      * When generating a SummaryGraph, the model is scanned and all instances
        of torch.nn.ModuleList are replaced with DistillerModulelist
      * Add tests for new functionality
      * Partially revert changes made to DistillerLSTM in commit 43548deb:
        Keep the refactored _create_cells_list function, but have it create
        a standard torch.nn.ModuleList (since we're the ModuleList issue
        automatically now, and no need to confuse users with ad-hoc list 
        implementations
      Unverified
      b614330c
  27. Jul 21, 2019
  28. Jul 10, 2019
    • Guy Jacob's avatar
      Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb
      Guy Jacob authored
      * "Net-aware quantization" - using the term coined in
        https://arxiv.org/abs/1811.09886. (section 3.2.2).
        Refers to considering sequences of modules when quantizing. This 
        isn't exactly layer fusion - we modify activation stats prior to
        setting quantization parameters, to make sure that when a module
        is followed by certain activation functions, only the relevant
        ranges are quantized. We do this for:
          * ReLU - Clip all negative values
          * Tanh / Sigmoid - Clip according to the (approximated) saturation
            values for these functions. We use [-4, 4] for tanh and [-6, 6]
            for sigmoid.
      
      * Perform batch-norm folding before post-training quantization.
        Batch-norm parameters are folded into the parameters of the previous
        layer and the BN layer is replaced with an identity module.
      
      * Both BN folding and "net-aware" are now automatically executed
        in PostTrainLinearQuantizer (details of this change below)
      
      * BN folding enabled by new generic mechanism to "fuse" module
        sequences (at the Python API level)
          * First module in sequence is replaced/modified by a user-provided
            function, rest of moudles replaced with nn.Identity
      
      * Quantizer changes:
        * Optionally create adjacency map during prepare_model
        * Subclasses may enforce adjacency map creation
        * Refatcoring: Replace _prepare_model_impl with pre and post
          override-able "callbacks", so core functionality is always executed
      
      * PostTrainLinearQuantizer Changes:
        * Enforce creation of adjacency map. This means users must now pass a
          dummy input to PostTrainLinearQuantizer.prepare_model
        * Before module replacement - Apply BN folding and stats updates according
          to net-aware quantization
      
      * Updated the language model quantization tutorial to reflect the new
        functionality
      
      * Updated the image classification post-train quantization samples
        (command line and YAML)
      
      * Other changes:
        * Distller LSTM implementation:
          Replace the ModuleList for cells with a plain list. The PyTorch trace
          mechanism doesn't "see" ModuleList objects, it only sees the 
          contained modules. This means that the "scopeName" of these modules
          isn't complete, which makes it impossible to match op names in 
          SummaryGraph to modules in the Python model.
        * ActivationStatsCollector: Ignore nn.Identity modules
      Unverified
      43548deb
  29. Jul 04, 2019
    • Guy Jacob's avatar
      Switch to PyTorch 1.1.0 (#306) · 032b1f74
      Guy Jacob authored
      * PyTorch 1.1.0 now required
        - Moved other dependencies to up-to-date versions as well
      * Adapt LR scheduler to PyTorch 1.1 API changes:
        - Change lr_scheduler.step() calls to succeed validate calls,
          during training
        - Pass to lr_scheduler.step() caller both loss and top1
          (Resolves issue #240)
      * Adapt thinning for PyTorch 1.1 semantic changes
        - **KNOWN ISSUE**: When a thinning recipe is applied, in certain
          cases PyTorch displays this warning:
          "UserWarning: non-inplace resize is deprecated".
          To be fixed later
      * SummaryGraph: Workaround for new scope name issue from PyTorch 1.1.0
      * Adapt to updated PyTest version:
        - Stop using deprecated 'message' parameter of pytest.raises(),
          use pytest.fail() instead
        - Make sure only a single test case per pytest.raises context
      * Move PyTorch version check to root __init__.py 
        - This means the version each checked when Distiller is first
          imported. A RuntimeError is raised if the version is wrong.
      * Updates to parameter_histograms notebook:
        - Replace deprecated normed argument with density
        - Add sparsity rate to plot title
        - Load model in CPU
      Unverified
      032b1f74
  30. Jul 03, 2019
  31. Jul 01, 2019
  32. Jun 23, 2019
  33. Jun 03, 2019
    • Lev Zlotnik's avatar
      [Breaking] PTQ: Removed special handling of clipping overrides · 3cde6c5e
      Lev Zlotnik authored
      * In PostTrainLinearQuantizer - moved 'clip_acts' and 'clip_n_stds'
        to overrides, removed 'no_clip_layers' parameter from __init__
      * The 'no_clip_layers' command line argument REMAINS, handled in 
        PostTrainLinearQuantizer.from_args()
      * Removed old code from comments, fixed warnings in 
        test_post_train_quant.py
      * Updated tests
      * Update post-train quant sample YAML
      3cde6c5e
  34. May 30, 2019
    • Neta Zmora's avatar
      MNIST support · f8085cf4
      Neta Zmora authored
      -Added a test for MNIST
      -Added classification_get_dummy_input() to apputils/data_loaders.py
      and wrapped it with get_dummy_input() for (temporary) backward
      compatibility.
      - Changed simplenet_mnist so that it supports thinning
      f8085cf4
Loading