Skip to content
Snippets Groups Projects
  1. Nov 16, 2019
  2. Nov 11, 2019
    • Neta Zmora's avatar
      Pruning with virtual Batch-norm statistics folding (#415) · c849a25f
      Neta Zmora authored
      * pruning: add an option to virtually fold BN into Conv2D for ranking
      
      PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph).  Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods.  We attenuate using the running values of the mean and variance, as is done in quantization.
      This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported).
      e.g.:
      policies:
        - pruner:
            instance_name : low_pruner
            args:
              fold_batchnorm: True
          starting_epoch: 0
          ending_epoch: 30
          frequency: 2
      
      * AGP: non-functional refactoring
      
      distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity`
      to `_set_param_mask_by_sparsity_target`, which is a more appropriate function
      name as we don’t really prune in this function
      
      * Simplify GEMM weights input-channel ranking logic
      
      Ranking weight-matrices by input channels is similar to ranking 4D
      Conv weights by input channels, so there is no need for duplicate logic.
      
      distiller/pruning/ranked_structures_pruner.py
      -change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`,
      which is a more appropriate function name as we don’t really prune in this
      function
      -remove the code handling ranking of matrix rows
      
      distiller/norms.py – remove rank_cols.
      
      distiller/thresholding.py – in expand_binary_map treat `channels` group_type
      the same as the `cols` group_type when dealing with 2D weights
      
      * AGP: add example of ranking filters with virtual BN-folding
      
      Also update resnet20 AGP examples
      Unverified
      c849a25f
  3. Oct 23, 2019
  4. Sep 27, 2019
  5. Apr 01, 2019
    • Bar's avatar
      Add ResNeXt101 pruning example (#201) · 81f5c0c7
      Bar authored
      81f5c0c7
    • Bar's avatar
      Load optimizer from checkpoint (BREAKING - see details) (#182) · 992291cf
      Bar authored
      Load optimizer from checkpoint (BREAKING - see details) (#182)
      
      * Fixes issues #70, #145 and replaces PR #74
      * checkpoint.py
        * save_checkpoint will now save the optimizer type in addition to
          its state
        * load_checkpoint will now instantiate an optimizer based on the
          saved type and load its state
      * config.py: file/dict_config now accept the resumed epoch to pass to
        LR schedulers
      * policy.py: LRPolicy now passes the current epoch to the LR scheduler
      * Classifier compression sample
        * New flag '--resume-from' for properly resuming a saved training
          session, inc. optimizer state and epoch #
        * Flag '--reset-optimizer' added to allow discarding of a loaded
          optimizer.
        * BREAKING:
          * Previous flag '--resume' is deprecated and is mapped to
            '--resume-from' + '--reset-optimizer'. 
          * But, old resuming behavior had an inconsistency where the epoch
            count would continue from the saved epoch, but the LR scheduler
            was setup as if we were starting from epoch 0.
          * Using '--resume-from' + '--reset-optimizer' now will simply
            RESET the epoch count to 0 for the whole environment.
          * This means that scheduling configurations (in YAML or code)
            which assumed use of '--resume' might need to be changed to
            reflect the fact that the epoch count now starts from 0
          * All relevant YAML files under 'examples' modified to reflect
            this change
      * Initial support for ReduceLROnPlateau (#161):
        * Allow passing **kwargs to policies via the scheduler
        * Image classification now passes the validation loss to the
          scheduler, to be used yo ReduceLROnPlateau
        * The current implementation is experimental and subject to change
      992291cf
  6. Mar 23, 2019
  7. Feb 10, 2019
  8. Jan 09, 2019
  9. Jan 08, 2019
    • Bar's avatar
      Non-channel/filter block pruning (#119) · b9d53ff8
      Bar authored
      Block pruning: support specifying the block shape from the YAML file
      
      Block pruning refers to pruning 4-D structures of a specific shape.  This 
      is a why it is sometimes called structure-pruning or group-pruning 
      (confusing, I know).
      A specific example of block pruning is filter or channel pruning, which
      have a highly-regular block shape.   
      This commit adds support for pruning blocks/groups/structures
      that have irregular shapes that accelerate inference on a specific 
      hardware platform.  You can read more about the regularity of shapes in
      (Exploring the Regularity of Sparse Structure in
      Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf].
      
      When we want to introduce sparsity in order to reduce the compute load
      of a certain layer, we need to understand how the HW and SW perform
      the layer's operation, and how this operation is vectorized.  Then we can
      induce sparsity to match the vector shape.
      
      For example, Intel AVX-512 are SIMD instructions that apply the same
      instruction (Single Instruction) on a vector of inputs (Multiple
      Data).  The following single instruction performs an element-wise
      multiplication of two 16 32-bit element vectors:
      
           __m256i result = __mm256_mul_epi32(vec_a, vec_b);
      
      If either vec_a or vec_b are partially sparse, we still need to perform
      the multiplication operation and the sparsity does not help reduce the
      cost (power, latency) of computation.  However, if either vec_a or vec_b
      contain only zeros then we can eliminate entirely the instruction.  In this 
      case, we say that we would like to have group sparsity of 16-elements.  
      I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements.
      
      Things are a bit more involved because we also need to understand how the
      software maps layer operations to hardware.  For example, a 3x3
      convolution can be computed as a direct-convolution, as a matrix multiply
      operation, or as a Winograd matrix operation (to name a few ways of
      computation).  These low-level operations are then mapped to SIMD
      instructions.
      
      Finally, the low-level SW needs to support a block-sparse storage-format
      for weight tensors (see for example:
      http://www.netlib.org/linalg/html_templates/node90.html)
      b9d53ff8
  10. Dec 19, 2018
  11. Dec 11, 2018
    • Neta Zmora's avatar
      ResNet50 filter-pruning · 2e115f9d
      Neta Zmora authored
      Filter-pruning using L1-nrom ranking and AGP for the setting the pruning-rate decay.
      
      Best Top1: 74.472 (epoch 89) vs. 76.15 baseline
      No. of Parameters: 12,335,296 (of 25,502,912) = 43.37% dense (51.63% sparse)
      Total MACs: 1,822,031,872 (of 4,089,184,256) = 44.56% compute = 2.24x
      2e115f9d
  12. Dec 01, 2018
    • Neta Zmora's avatar
      Important changes to pruning channels and filters (#93) · a0bf2a8f
      Neta Zmora authored
      This commit contains the main fix for issue #85.  It contains a couple of changes to the YAML structure pruning API, with examples.
      I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels).
      
      New syntax for defining Structured AGP.  I tried to make the syntax similar to fine-grained
      (i.e. element-wise) pruning.  All you need to do is add: ```group_type: Filters```.
      ```
        low_pruner:
          class: L1RankedStructureParameterPruner_AGP
          initial_sparsity : 0.10
          final_sparsity: 0.50
          group_type: Filters
          weights: [module.layer3.0.conv2.weight,
                    module.layer3.0.downsample.0.weight,
                    module.layer3.1.conv2.weight,
                    module.layer3.2.conv2.weight]
      ```
      
      If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```:
      ```
        low_pruner:
          class: L1RankedStructureParameterPruner_AGP
          initial_sparsity : 0.10
          final_sparsity: 0.50
          group_type: Filters
          group_dependency: Leader
          weights: [module.layer3.0.conv2.weight,
                    module.layer3.0.downsample.0.weight,
                    module.layer3.1.conv2.weight,
                    module.layer3.2.conv2.weight]
      ```
      
      Retired the old ```reg_regims``` API for describing one-shot structured-pruning.
      
      The new YAML API is very similar to AGP structured-pruning, which is much better
      than before.
      The new API also allows us to describe data-dependencies when doing one-shot
      structure pruning, just like AGP structured-pruning.
      
      This commit also includes further code refactoring.
      
      Old API:
      ```
        filter_pruner:
           class: 'L1RankedStructureParameterPruner'
           reg_regims:
             'module.layer1.0.conv1.weight': [0.6, '3D']
             'module.layer1.1.conv1.weight': [0.6, '3D']
      ```
      
      New API:
      ```
       filter_pruner:
          class: 'L1RankedStructureParameterPruner'
          group_type: Filters
          desired_sparsity: 0.6
          weights: [
            module.layer1.0.conv1.weight,
            module.layer1.1.conv1.weight]
      ```
      
      thresholding.py – separate the generation of the binary_map from the pruning_mask so that we
      can cache the binary map and share it between several modules.
      
      pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based”
      sub-graph pruning dependencies.  The concept is explained in issue #85
      
      
      agp-pruning/resnet20_filters.schedule_agp.yaml
      agp-pruning/resnet20_filters.schedule_agp_2.yaml
      agp-pruning/resnet20_filters.schedule_agp_3.yaml
      network_trimming/resnet56_cifar_activation_apoz.yaml
      network_trimming/resnet56_cifar_activation_apoz_v2.yaml
      Unverified
      a0bf2a8f
  13. Nov 09, 2018
  14. Nov 08, 2018
  15. Oct 31, 2018
  16. Oct 22, 2018
    • Neta Zmora's avatar
      Activation statistics collection (#61) · 54a5867e
      Neta Zmora authored
      Activation statistics can be leveraged to make pruning and quantization decisions, and so
      We added support to collect these data.
      - Two types of activation statistics are supported: summary statistics, and detailed records 
      per activation.
      Currently we support the following summaries: 
      - Average activation sparsity, per layer
      - Average L1-norm for each activation channel, per layer
      - Average sparsity for each activation channel, per layer
      
      For the detailed records we collect some statistics per activation and store it in a record.  
      Using this collection method generates more detailed data, but consumes more time, so
      Beware.
      
      * You can collect activation data for the different training phases: training/validation/test.
      * You can access the data directly from each module that you chose to collect stats for.  
      * You can also create an Excel workbook with the stats.
      
      To demonstrate use of activation collection we added a sample schedule which prunes 
      weight filters by the activation APoZ according to:
      "Network Trimming: A Data-Driven Neuron Pruning Approach towards 
      Efficient Deep Architectures",
      Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
      https://arxiv.org/abs/1607.03250
      
      We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
      and specifically we separated the AGP schedule from the filter pruning criterion.  We added
      examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
      random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
      and filter L1-norm (L1RankedStructureParameterPruner)
      Unverified
      54a5867e
  17. Oct 13, 2018
    • Neta Zmora's avatar
      AGP for structs (#58) · e0bfc796
      Neta Zmora authored
      Using the automatic gradual pruning for structured-pruning is very
      simple to use and produces good results.
      It is implemented in Google's TensorFlow framework.
      Unverified
      e0bfc796
  18. Oct 01, 2018
  19. Sep 20, 2018
    • Neta Zmora's avatar
      ResNet50 regularization using pruning · 57e6bd00
      Neta Zmora authored
      This schedule demonstrates low-rate pruning (26% sparsity) acting as a
      regularizer to reduce the generalization error of ResNet50 using the
      ImageNet dataset.
      
      We improve the ResNet50 Top1 test error by 0.4% (23.462 vs 23.85).
      Top5 is improved as well: 6.82 error vs. 7.13 error in the baseline
      57e6bd00
  20. Jun 15, 2018
  21. Jun 13, 2018
  22. Jun 07, 2018
  23. May 13, 2018
  24. May 07, 2018
  25. Apr 24, 2018
Loading