Skip to content
Snippets Groups Projects
  1. Dec 11, 2018
  2. Dec 09, 2018
  3. Dec 06, 2018
  4. Dec 04, 2018
  5. Dec 03, 2018
  6. Dec 02, 2018
  7. Dec 01, 2018
    • Neta Zmora's avatar
      Important changes to pruning channels and filters (#93) · a0bf2a8f
      Neta Zmora authored
      This commit contains the main fix for issue #85.  It contains a couple of changes to the YAML structure pruning API, with examples.
      I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels).
      
      New syntax for defining Structured AGP.  I tried to make the syntax similar to fine-grained
      (i.e. element-wise) pruning.  All you need to do is add: ```group_type: Filters```.
      ```
        low_pruner:
          class: L1RankedStructureParameterPruner_AGP
          initial_sparsity : 0.10
          final_sparsity: 0.50
          group_type: Filters
          weights: [module.layer3.0.conv2.weight,
                    module.layer3.0.downsample.0.weight,
                    module.layer3.1.conv2.weight,
                    module.layer3.2.conv2.weight]
      ```
      
      If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```:
      ```
        low_pruner:
          class: L1RankedStructureParameterPruner_AGP
          initial_sparsity : 0.10
          final_sparsity: 0.50
          group_type: Filters
          group_dependency: Leader
          weights: [module.layer3.0.conv2.weight,
                    module.layer3.0.downsample.0.weight,
                    module.layer3.1.conv2.weight,
                    module.layer3.2.conv2.weight]
      ```
      
      Retired the old ```reg_regims``` API for describing one-shot structured-pruning.
      
      The new YAML API is very similar to AGP structured-pruning, which is much better
      than before.
      The new API also allows us to describe data-dependencies when doing one-shot
      structure pruning, just like AGP structured-pruning.
      
      This commit also includes further code refactoring.
      
      Old API:
      ```
        filter_pruner:
           class: 'L1RankedStructureParameterPruner'
           reg_regims:
             'module.layer1.0.conv1.weight': [0.6, '3D']
             'module.layer1.1.conv1.weight': [0.6, '3D']
      ```
      
      New API:
      ```
       filter_pruner:
          class: 'L1RankedStructureParameterPruner'
          group_type: Filters
          desired_sparsity: 0.6
          weights: [
            module.layer1.0.conv1.weight,
            module.layer1.1.conv1.weight]
      ```
      
      thresholding.py – separate the generation of the binary_map from the pruning_mask so that we
      can cache the binary map and share it between several modules.
      
      pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based”
      sub-graph pruning dependencies.  The concept is explained in issue #85
      
      
      agp-pruning/resnet20_filters.schedule_agp.yaml
      agp-pruning/resnet20_filters.schedule_agp_2.yaml
      agp-pruning/resnet20_filters.schedule_agp_3.yaml
      network_trimming/resnet56_cifar_activation_apoz.yaml
      network_trimming/resnet56_cifar_activation_apoz_v2.yaml
      Unverified
      a0bf2a8f
  8. Nov 29, 2018
    • Bar's avatar
      Fix assign_layer_fq_names (#88) · a2f57e6b
      Bar authored
      Add support to models that contain named empty layers (i.e. have type: NoneType).
      This fix catches the AttributeError that is raised when a named NoneType layer is detected, and the layer is ignored.
      a2f57e6b
  9. Nov 25, 2018
  10. Nov 24, 2018
    • Guy Jacob's avatar
      30812b87
    • Neta Zmora's avatar
      Fix activation stats for Linear layers · 22e3ea8b
      Neta Zmora authored
      Thanks to Dan Alistarh for bringing this issue to my attention.
      The activations of Linear layers have shape (batch_size, output_size) and those
      of Convolution layers have shape (batch_size, num_channels, width, height) and
      this distinction in shape was not correctly handled.
      
      This commit also fixes sparsity computation for very large activations, as seen
      in VGG16, which leads to memory exhaustion.  One solution is to use smaller
      batch sizes, but this commit uses a different solution, which counts zeros “manually”,
      and using less space.
      
      Also in this commit:
      - Added a “caveats” section to the documentation.
      - Added more tests.
      22e3ea8b
  11. Nov 22, 2018
    • Neta Zmora's avatar
      Fix for issue #82 · fe9ffb17
      Neta Zmora authored
      The super() method of the wrong subclass was used.
      In this case there were no practical implications, but we
      need to move to the less error-prone syntax of Python 3.x
      which does not require us to specify the subclass.
      
      I change the super() invocations in the entire file and ran
      two schedules for ResNet56 and actually got better results than
      previously.  I don't think these results are related to this
      change, and I cannot explain them.  Nontheless, I am committing
      these new results, because I also fixed the command-line parameters
      of resnet56_cifar_filter_rank_v2.yaml which had a copy & paste
      error in it.
      fe9ffb17
    • Neta Zmora's avatar
      Fix Issue 79 (#81) · acbb4b4d
      Neta Zmora authored
      * Fix issue #79
      
      Change the default values so that the following scheduler meta-data keys
      are always defined: 'starting_epoch', 'ending_epoch', 'frequency'
      
      * compress_classifier.py: add a new argument
      
      Allow the specification, from the command line arguments,  of the range of
      pruning levels scanned when doing sensitivity analysis
      
      * Add regression test for issue #79
      Unverified
      acbb4b4d
  12. Nov 21, 2018
  13. Nov 20, 2018
    • Neta Zmora's avatar
      Bug fix: value of best_top1 stored in the checkpoint may be wrong (#77) · 6242afed
      Neta Zmora authored
      * Bug fix: value of best_top1 stored in the checkpoint may be wrong
      
      If you invoke compress_clasifier.py with --num-best-scores=n
      with n>1, then the value of best_top1 stored in checkpoints is wrong.
      Unverified
      6242afed
    • Neta Zmora's avatar
      Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51
      Neta Zmora authored
      When we resume from a checkpoint, we usually want to continue using the checkpoint’s
      masks.  I say “usually” because I can see a situation where we want to prune a model
      and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
      masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
      (https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
      ```compress_classifier.py``` for the time being – so we ignore DSD.
      
      There are two possible situations when we resume a checkpoint that has a serialized
      ```CompressionScheduler``` with pruning masks:
      1. We are planning on using a new ```CompressionScheduler``` that is defined in a
      schedule YAML file.  In this case, we want to copy the masks from the serialized
      ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
      constructing from the YAML file.  This is one fix.
      2. We are resuming a checkpoint, but without using a YAML schedule file.
      In this case we want to use the ```CompressionScheduler``` that we loaded from the
      checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
      as we train, so that we don’t lose them.  This is the second fix.
      
      For DSD, we would need a new flag that would override using the ```CompressionScheduler```
      that we load from the checkpoint.
      Unverified
      78e98a51
  14. Nov 09, 2018
  15. Nov 08, 2018
Loading