Skip to content
Snippets Groups Projects
  1. Jan 16, 2019
    • Bar's avatar
      compress_classifier.py refactoring (#126) · cfbc3798
      Bar authored
      * Support for multi-phase activations logging
      
      Enable logging activation both durning training and validation at
      the same session.
      
      * Refactoring: Move parser to its own file
      
      * Parser is moved from compress_classifier into its own file.
      * Torch version check is moved to precede main() call.
      * Move main definition to the top of the file.
      * Modify parser choices to case-insensitive
      cfbc3798
  2. Jan 15, 2019
  3. Jan 13, 2019
  4. Jan 10, 2019
    • Gal Novik's avatar
      Enable compute (training/inference) on the CPU · 007b6903
      Gal Novik authored
      In compress_classifier.py we added a new application argument: --cpu
      which you can use to force compute (training/inference) to run on the CPU 
      when you invoke compress_classifier.py on a machine which has Nvidia GPUs.
      
      If your machine lacks Nvidia GPUs, then the compute will now run on the CPU
      (and you do not need the new flag).
      
      Caveat: we did not fully test the CPU support for the code in the Jupyter 
      notebooks.  If you find a bug, we apologize and appreciate your feedback.
      007b6903
  5. Dec 19, 2018
  6. Dec 16, 2018
  7. Dec 14, 2018
    • Neta Zmora's avatar
      AMC: more refactoring · 1ab288ae
      Neta Zmora authored
      Added notebook for visualizing the discovery of compressed networks.
      Added one-epoch fine-tuning at the end of every episode, which is
      required for very sensitive models like Plain20.
      1ab288ae
  8. Dec 11, 2018
  9. Dec 06, 2018
  10. Dec 04, 2018
    • Guy Jacob's avatar
      Range-Based Linear Quantization Features (#95) · 907a6f04
      Guy Jacob authored
      * Asymmetric post-training quantization (only symmetric supported so until now)
      * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
      * Per-channel quantization support in both training and post-training
      * Added tests and examples
      * Updated documentation
      907a6f04
  11. Nov 24, 2018
  12. Nov 22, 2018
    • Neta Zmora's avatar
      Fix Issue 79 (#81) · acbb4b4d
      Neta Zmora authored
      * Fix issue #79
      
      Change the default values so that the following scheduler meta-data keys
      are always defined: 'starting_epoch', 'ending_epoch', 'frequency'
      
      * compress_classifier.py: add a new argument
      
      Allow the specification, from the command line arguments,  of the range of
      pruning levels scanned when doing sensitivity analysis
      
      * Add regression test for issue #79
      acbb4b4d
  13. Nov 21, 2018
  14. Nov 20, 2018
    • Neta Zmora's avatar
      Bug fix: value of best_top1 stored in the checkpoint may be wrong (#77) · 6242afed
      Neta Zmora authored
      * Bug fix: value of best_top1 stored in the checkpoint may be wrong
      
      If you invoke compress_clasifier.py with --num-best-scores=n
      with n>1, then the value of best_top1 stored in checkpoints is wrong.
      6242afed
    • Neta Zmora's avatar
      Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51
      Neta Zmora authored
      When we resume from a checkpoint, we usually want to continue using the checkpoint’s
      masks.  I say “usually” because I can see a situation where we want to prune a model
      and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
      masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
      (https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
      ```compress_classifier.py``` for the time being – so we ignore DSD.
      
      There are two possible situations when we resume a checkpoint that has a serialized
      ```CompressionScheduler``` with pruning masks:
      1. We are planning on using a new ```CompressionScheduler``` that is defined in a
      schedule YAML file.  In this case, we want to copy the masks from the serialized
      ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
      constructing from the YAML file.  This is one fix.
      2. We are resuming a checkpoint, but without using a YAML schedule file.
      In this case we want to use the ```CompressionScheduler``` that we loaded from the
      checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
      as we train, so that we don’t lose them.  This is the second fix.
      
      For DSD, we would need a new flag that would override using the ```CompressionScheduler```
      that we load from the checkpoint.
      78e98a51
  15. Nov 08, 2018
  16. Nov 06, 2018
  17. Nov 05, 2018
    • Neta Zmora's avatar
      Dynamic Network Surgery (#69) · 60a4f44a
      Neta Zmora authored
      Added an implementation of:
      
      Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen.
      NIPS 2016, https://arxiv.org/abs/1608.04493.
      
      - Added SplicingPruner: A pruner that both prunes and splices connections.
      - Included an example schedule on ResNet20 CIFAR.
      - New features for compress_classifier.py:
         1. Added the "--masks-sparsity" which, when enabled, logs the sparsity
            of the weight masks during training.
        2. Added a new command-line argument to report the top N
            best accuracy scores, instead of just the highest score.
            This is sometimes useful when pruning a pre-trained model,
            that has the best Top1 accuracy in the first few pruning epochs.
      - New features for PruningPolicy:
         1. The pruning policy can use two copies of the weights: one is used during
             the forward-pass, the other during the backward pass.
             This is controlled by the “mask_on_forward_only” argument.
         2. If we enable “mask_on_forward_only”, we probably want to permanently apply
             the mask at some point (usually once the pruning phase is done).
             This is controlled by the “keep_mask” argument.
         3. We introduce a first implementation of scheduling at the training-iteration
             granularity (i.e. at the mini-batch granularity). Until now we could schedule
             pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency”
             (disable by setting to zero).
      
         Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler.
         Need to reexamine this in the future.
      60a4f44a
  18. Nov 01, 2018
  19. Oct 22, 2018
    • Neta Zmora's avatar
      Activation statistics collection (#61) · 54a5867e
      Neta Zmora authored
      Activation statistics can be leveraged to make pruning and quantization decisions, and so
      We added support to collect these data.
      - Two types of activation statistics are supported: summary statistics, and detailed records 
      per activation.
      Currently we support the following summaries: 
      - Average activation sparsity, per layer
      - Average L1-norm for each activation channel, per layer
      - Average sparsity for each activation channel, per layer
      
      For the detailed records we collect some statistics per activation and store it in a record.  
      Using this collection method generates more detailed data, but consumes more time, so
      Beware.
      
      * You can collect activation data for the different training phases: training/validation/test.
      * You can access the data directly from each module that you chose to collect stats for.  
      * You can also create an Excel workbook with the stats.
      
      To demonstrate use of activation collection we added a sample schedule which prunes 
      weight filters by the activation APoZ according to:
      "Network Trimming: A Data-Driven Neuron Pruning Approach towards 
      Efficient Deep Architectures",
      Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
      https://arxiv.org/abs/1607.03250
      
      We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
      and specifically we separated the AGP schedule from the filter pruning criterion.  We added
      examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
      random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
      and filter L1-norm (L1RankedStructureParameterPruner)
      54a5867e
  20. Sep 20, 2018
  21. Sep 16, 2018
  22. Sep 03, 2018
  23. Aug 09, 2018
    • Guy Jacob's avatar
      Generalize the loss value returned from before_backward_pass callbacks (#38) · a43b9f10
      Guy Jacob authored
      * Instead of a single additive value (which so far represented only the
        regularizer loss), callbacks return a new overall loss
      * Policy callbacks also return the individual loss components used to
        calculate the new overall loss.
      * Add boolean flag to the Scheduler's callback so applications can choose
        if they want to get individual loss components, or just the new overall
        loss
      * In compress_classifier.py, log the individual loss components
      * Add test for the loss-from-callback flow
      a43b9f10
  24. Jul 31, 2018
  25. Jul 25, 2018
  26. Jul 22, 2018
    • Gal Novik's avatar
      PACT quantizer (#30) · df9a00ce
      Gal Novik authored
      * Adding PACT quantization method
      * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
      * Updated documentation and tests
      df9a00ce
  27. Jul 17, 2018
  28. Jul 13, 2018
    • Neta Zmora's avatar
      ADC (Automatic Deep Compression) example + features, tests, bug fixes (#28) · 718f777b
      Neta Zmora authored
      This is a merge of the ADC branch and master.
      ADC (using a DDPG RL agent to compress image classifiers) is still WiP and requires
      An unreleased version of Coach (https://github.com/NervanaSystems/coach).
      
      Small features in this commit:
      -Added model_find_module() - find module object given its name
      - Add channel ranking and pruning: pruning/ranked_structures_pruner.py
      - Add a CIFAR10 VGG16 model: models/cifar10/vgg_cifar.py
      - Thinning: change the level of some log messages – some of the messages were
      moved to ‘debug’ level because they are not usually interesting.
      - Add a function to print nicely formatted integers - distiller/utils.py
      - Sensitivity analysis for channels-removal
      - compress_classifier.py – handle keyboard interrupts
      - compress_classifier.py – fix re-raise of exceptions, so they maintain call-stack
      
      -Added tests:
      -- test_summarygraph.py: test_simplenet() - Added a regression test to target a bug that occurs when taking the predecessor of the first node in a graph
      -- test_ranking.py - test_ch_ranking, test_ranked_channel_pruning
      -- test_model_summary.py - test_png_generation, test_summary (sparsity/ compute/model/modules)
      
      - Bug fixes in this commit:
      -- Thinning bug fix: handle zero-sized 'indices' tensor
      During the thinning process, the 'indices' tensor can become zero-sized,
      and will have an undefiend length. Therefore, we need to check for this
      situation when assessing the number of elements in 'indices'
      -- Language model: adjust main.py to new distiller.model_summary API
      718f777b
  29. Jul 11, 2018
    • Neta Zmora's avatar
      More robust handling of data-parallel/serial graphs (#27) · b64be690
      Neta Zmora authored
      Remove the complicated logic trying to handle data-parallel models as
      serially-processed models, and vice versa.
      
      *Function distiller.utils.make_non_parallel_copy() does the heavy lifting of
      replacing  all instances of nn.DataParallel in a model with instances of
      DoNothingModuleWrapper.
      The DoNothingModuleWrapper wrapper does nothing but forward to the
      wrapped module.  This is a trick we use to transform a data-parallel model
      to a serial-processed model.
      
      *SummaryGraph uses a copy of the model after the model is processed by
      distiller.make_non_parallel_copy() which renders the model non-data-parallel.
      
      *The same goes for model_performance_summary()
      
      *Model inputs are explicitly placed on the Cuda device, since now all models are
      Executed on the CPU.  Previously, if a model was not created using
      nn.DataParallel, then the model was not explicitly placed on the Cuda device.
      
      *The logic in distiller.CompressionScheduler that attempted to load a
      model parallel model and process it serially, or load a serial model and
      process it data-parallel, was removed.  This removes a lot of fuzziness and makes
      the code more robust: we do not needlessly try to be heroes.
      
      * model summaries - remove pytorch 0.4 warning
      
      * create_model: remove redundant .cuda() call
      
      * Tests: support both parallel and serial tests
      b64be690
  30. Jun 30, 2018
    • Neta Zmora's avatar
    • Neta Zmora's avatar
      Bug fix: add support for thinning the optimizer · b21f449b
      Neta Zmora authored
      You no longer need to use —momentum=0 when removing structures
      dynamically.
      The SGD momentum update (velocity) is dependent on the weights, which
      PyTorch optimizers cache internally.  This caching is not a problem for
      filter/channel removal (thinning) because although we dynamically
      change the shapes of the weights tensors, we don’t change the weights
      tensors themselves.
      PyTorch’s SGD creates tensors to store the momentum updates, and these
      tensors have the same shape as the weights tensors.  When we change the
      weights tensors, we need to make the appropriate changes in the Optimizer,
      or disable the momentum.
      We added a new function - thinning.optimizer_thinning() - to do this.
      This function is brittle as it is tested only on optim.SGD and relies on the
      internal representation of the SGD optimizer, which can change w/o notice.
      For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
      Which also depend the shape of the weight tensors.
      We needed to pass the Optimizer instance to Thinning policies
      (ChannelRemover, FilterRemover) via the callbacks, which required us
      to change the callback interface.
      In the future we plan a bigger change to the callback API, to allow
      passing of arbitrary context from the training environment to Distiller.
      
      Also in this commit:
      * compress_classifier.py had special handling for resnet layer-removal, which
      is used in examples/ssl/ssl_4D-removal_training.yaml.
      This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
      Removing support for layer-removal.
      * Added to the tests invocation of forward and backward passes over a model.
      This tests more of the real flows, which use the optimizer and construct
      gradient tensors.
      * Added a test of a special case of convolution filter-pruning which occurs
      when the next layer is fully-connected (linear)
      b21f449b
  31. Jun 21, 2018
  32. Jun 19, 2018
    • Guy Jacob's avatar
      Make PNG summary compatible with latest SummaryGraph class changes (#7) · 9e57219e
      Guy Jacob authored
      * Modify 'create_png' to use the correct data structures (dicts instead
        lists, etc.)
      * Handle case where an op was called not from a module. This relates to:
        * ONNX->"User-Friendly" name conversion to account for cases where
        * Detection of existing op with same name
        In both cases use the ONNX op type in addition to the op name
      * Return an "empty" shape instead of None when ONNX couldn't infer
        a parameter's shape
      * Expose option of PNG summary with parameters to user
      9e57219e
  33. May 17, 2018
    • Neta Zmora's avatar
      Fix system tests failure · a7ed8cad
      Neta Zmora authored
      The latest changes to the logger caused the CI tests to fail,
      because test assumes that the logging.conf file is present in the
      same directory as the sample application script.
      The sample application used cwd() instead, and did not find the
      log configuration file.
      a7ed8cad
  34. May 16, 2018
Loading