Skip to content
Snippets Groups Projects
  1. Nov 25, 2018
  2. Nov 24, 2018
    • Guy Jacob's avatar
      30812b87
    • Neta Zmora's avatar
      Fix activation stats for Linear layers · 22e3ea8b
      Neta Zmora authored
      Thanks to Dan Alistarh for bringing this issue to my attention.
      The activations of Linear layers have shape (batch_size, output_size) and those
      of Convolution layers have shape (batch_size, num_channels, width, height) and
      this distinction in shape was not correctly handled.
      
      This commit also fixes sparsity computation for very large activations, as seen
      in VGG16, which leads to memory exhaustion.  One solution is to use smaller
      batch sizes, but this commit uses a different solution, which counts zeros “manually”,
      and using less space.
      
      Also in this commit:
      - Added a “caveats” section to the documentation.
      - Added more tests.
      22e3ea8b
  3. Nov 22, 2018
    • Neta Zmora's avatar
      Fix for issue #82 · fe9ffb17
      Neta Zmora authored
      The super() method of the wrong subclass was used.
      In this case there were no practical implications, but we
      need to move to the less error-prone syntax of Python 3.x
      which does not require us to specify the subclass.
      
      I change the super() invocations in the entire file and ran
      two schedules for ResNet56 and actually got better results than
      previously.  I don't think these results are related to this
      change, and I cannot explain them.  Nontheless, I am committing
      these new results, because I also fixed the command-line parameters
      of resnet56_cifar_filter_rank_v2.yaml which had a copy & paste
      error in it.
      fe9ffb17
    • Neta Zmora's avatar
      Fix Issue 79 (#81) · acbb4b4d
      Neta Zmora authored
      * Fix issue #79
      
      Change the default values so that the following scheduler meta-data keys
      are always defined: 'starting_epoch', 'ending_epoch', 'frequency'
      
      * compress_classifier.py: add a new argument
      
      Allow the specification, from the command line arguments,  of the range of
      pruning levels scanned when doing sensitivity analysis
      
      * Add regression test for issue #79
      Unverified
      acbb4b4d
  4. Nov 21, 2018
  5. Nov 20, 2018
    • Neta Zmora's avatar
      Bug fix: value of best_top1 stored in the checkpoint may be wrong (#77) · 6242afed
      Neta Zmora authored
      * Bug fix: value of best_top1 stored in the checkpoint may be wrong
      
      If you invoke compress_clasifier.py with --num-best-scores=n
      with n>1, then the value of best_top1 stored in checkpoints is wrong.
      Unverified
      6242afed
    • Neta Zmora's avatar
      Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51
      Neta Zmora authored
      When we resume from a checkpoint, we usually want to continue using the checkpoint’s
      masks.  I say “usually” because I can see a situation where we want to prune a model
      and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
      masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
      (https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
      ```compress_classifier.py``` for the time being – so we ignore DSD.
      
      There are two possible situations when we resume a checkpoint that has a serialized
      ```CompressionScheduler``` with pruning masks:
      1. We are planning on using a new ```CompressionScheduler``` that is defined in a
      schedule YAML file.  In this case, we want to copy the masks from the serialized
      ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
      constructing from the YAML file.  This is one fix.
      2. We are resuming a checkpoint, but without using a YAML schedule file.
      In this case we want to use the ```CompressionScheduler``` that we loaded from the
      checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
      as we train, so that we don’t lose them.  This is the second fix.
      
      For DSD, we would need a new flag that would override using the ```CompressionScheduler```
      that we load from the checkpoint.
      Unverified
      78e98a51
  6. Nov 09, 2018
  7. Nov 08, 2018
  8. Nov 07, 2018
  9. Nov 06, 2018
  10. Nov 05, 2018
    • Neta Zmora's avatar
      Dynamic Network Surgery (#69) · 60a4f44a
      Neta Zmora authored
      Added an implementation of:
      
      Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen.
      NIPS 2016, https://arxiv.org/abs/1608.04493.
      
      - Added SplicingPruner: A pruner that both prunes and splices connections.
      - Included an example schedule on ResNet20 CIFAR.
      - New features for compress_classifier.py:
         1. Added the "--masks-sparsity" which, when enabled, logs the sparsity
            of the weight masks during training.
        2. Added a new command-line argument to report the top N
            best accuracy scores, instead of just the highest score.
            This is sometimes useful when pruning a pre-trained model,
            that has the best Top1 accuracy in the first few pruning epochs.
      - New features for PruningPolicy:
         1. The pruning policy can use two copies of the weights: one is used during
             the forward-pass, the other during the backward pass.
             This is controlled by the “mask_on_forward_only” argument.
         2. If we enable “mask_on_forward_only”, we probably want to permanently apply
             the mask at some point (usually once the pruning phase is done).
             This is controlled by the “keep_mask” argument.
         3. We introduce a first implementation of scheduling at the training-iteration
             granularity (i.e. at the mini-batch granularity). Until now we could schedule
             pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency”
             (disable by setting to zero).
      
         Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler.
         Need to reexamine this in the future.
      Unverified
      60a4f44a
  11. Nov 04, 2018
  12. Nov 02, 2018
    • Neta Zmora's avatar
      Update README · 1757f213
      Neta Zmora authored
      Changed the description of the feature-set.
      Updating the README a little bit since a lot has changed since we released 6 months ago.  Still a lot to add/remove/change.
      Unverified
      1757f213
  13. Nov 01, 2018
  14. Oct 31, 2018
  15. Oct 29, 2018
  16. Oct 27, 2018
  17. Oct 26, 2018
  18. Oct 25, 2018
    • Neta Zmora's avatar
      Pruning: remove unnecessary backward hook used for masking gradients (#63) · 9c701c1c
      Neta Zmora authored
      After commit f396c34a362731c765370d368877c2ca367ad651, we
      Now always apply a pruning mask at the end of mini-batches (this is
      because we understand that weights may be updated by SGD+momentum,
      even for weights that are masked. Therefore, there is no need for masking
      the gradients: we always mask the weights at the end of the mini-batch.
      
      See issue #53 for more details.
      Unverified
      9c701c1c
  19. Oct 23, 2018
Loading