Skip to content
Snippets Groups Projects
  1. Nov 08, 2018
  2. Nov 06, 2018
  3. Nov 05, 2018
    • Neta Zmora's avatar
      Dynamic Network Surgery (#69) · 60a4f44a
      Neta Zmora authored
      Added an implementation of:
      
      Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen.
      NIPS 2016, https://arxiv.org/abs/1608.04493.
      
      - Added SplicingPruner: A pruner that both prunes and splices connections.
      - Included an example schedule on ResNet20 CIFAR.
      - New features for compress_classifier.py:
         1. Added the "--masks-sparsity" which, when enabled, logs the sparsity
            of the weight masks during training.
        2. Added a new command-line argument to report the top N
            best accuracy scores, instead of just the highest score.
            This is sometimes useful when pruning a pre-trained model,
            that has the best Top1 accuracy in the first few pruning epochs.
      - New features for PruningPolicy:
         1. The pruning policy can use two copies of the weights: one is used during
             the forward-pass, the other during the backward pass.
             This is controlled by the “mask_on_forward_only” argument.
         2. If we enable “mask_on_forward_only”, we probably want to permanently apply
             the mask at some point (usually once the pruning phase is done).
             This is controlled by the “keep_mask” argument.
         3. We introduce a first implementation of scheduling at the training-iteration
             granularity (i.e. at the mini-batch granularity). Until now we could schedule
             pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency”
             (disable by setting to zero).
      
         Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler.
         Need to reexamine this in the future.
      Unverified
      60a4f44a
  4. Nov 01, 2018
  5. Oct 22, 2018
    • Neta Zmora's avatar
      Activation statistics collection (#61) · 54a5867e
      Neta Zmora authored
      Activation statistics can be leveraged to make pruning and quantization decisions, and so
      We added support to collect these data.
      - Two types of activation statistics are supported: summary statistics, and detailed records 
      per activation.
      Currently we support the following summaries: 
      - Average activation sparsity, per layer
      - Average L1-norm for each activation channel, per layer
      - Average sparsity for each activation channel, per layer
      
      For the detailed records we collect some statistics per activation and store it in a record.  
      Using this collection method generates more detailed data, but consumes more time, so
      Beware.
      
      * You can collect activation data for the different training phases: training/validation/test.
      * You can access the data directly from each module that you chose to collect stats for.  
      * You can also create an Excel workbook with the stats.
      
      To demonstrate use of activation collection we added a sample schedule which prunes 
      weight filters by the activation APoZ according to:
      "Network Trimming: A Data-Driven Neuron Pruning Approach towards 
      Efficient Deep Architectures",
      Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
      https://arxiv.org/abs/1607.03250
      
      We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
      and specifically we separated the AGP schedule from the filter pruning criterion.  We added
      examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
      random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
      and filter L1-norm (L1RankedStructureParameterPruner)
      Unverified
      54a5867e
  6. Sep 20, 2018
  7. Sep 16, 2018
  8. Sep 03, 2018
  9. Aug 09, 2018
    • Guy Jacob's avatar
      Generalize the loss value returned from before_backward_pass callbacks (#38) · a43b9f10
      Guy Jacob authored
      * Instead of a single additive value (which so far represented only the
        regularizer loss), callbacks return a new overall loss
      * Policy callbacks also return the individual loss components used to
        calculate the new overall loss.
      * Add boolean flag to the Scheduler's callback so applications can choose
        if they want to get individual loss components, or just the new overall
        loss
      * In compress_classifier.py, log the individual loss components
      * Add test for the loss-from-callback flow
      Unverified
      a43b9f10
  10. Jul 31, 2018
  11. Jul 25, 2018
  12. Jul 22, 2018
    • Gal Novik's avatar
      PACT quantizer (#30) · df9a00ce
      Gal Novik authored
      * Adding PACT quantization method
      * Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
      * Updated documentation and tests
      df9a00ce
  13. Jul 17, 2018
  14. Jul 13, 2018
    • Neta Zmora's avatar
      ADC (Automatic Deep Compression) example + features, tests, bug fixes (#28) · 718f777b
      Neta Zmora authored
      This is a merge of the ADC branch and master.
      ADC (using a DDPG RL agent to compress image classifiers) is still WiP and requires
      An unreleased version of Coach (https://github.com/NervanaSystems/coach).
      
      Small features in this commit:
      -Added model_find_module() - find module object given its name
      - Add channel ranking and pruning: pruning/ranked_structures_pruner.py
      - Add a CIFAR10 VGG16 model: models/cifar10/vgg_cifar.py
      - Thinning: change the level of some log messages – some of the messages were
      moved to ‘debug’ level because they are not usually interesting.
      - Add a function to print nicely formatted integers - distiller/utils.py
      - Sensitivity analysis for channels-removal
      - compress_classifier.py – handle keyboard interrupts
      - compress_classifier.py – fix re-raise of exceptions, so they maintain call-stack
      
      -Added tests:
      -- test_summarygraph.py: test_simplenet() - Added a regression test to target a bug that occurs when taking the predecessor of the first node in a graph
      -- test_ranking.py - test_ch_ranking, test_ranked_channel_pruning
      -- test_model_summary.py - test_png_generation, test_summary (sparsity/ compute/model/modules)
      
      - Bug fixes in this commit:
      -- Thinning bug fix: handle zero-sized 'indices' tensor
      During the thinning process, the 'indices' tensor can become zero-sized,
      and will have an undefiend length. Therefore, we need to check for this
      situation when assessing the number of elements in 'indices'
      -- Language model: adjust main.py to new distiller.model_summary API
      Unverified
      718f777b
  15. Jul 11, 2018
    • Neta Zmora's avatar
      More robust handling of data-parallel/serial graphs (#27) · b64be690
      Neta Zmora authored
      Remove the complicated logic trying to handle data-parallel models as
      serially-processed models, and vice versa.
      
      *Function distiller.utils.make_non_parallel_copy() does the heavy lifting of
      replacing  all instances of nn.DataParallel in a model with instances of
      DoNothingModuleWrapper.
      The DoNothingModuleWrapper wrapper does nothing but forward to the
      wrapped module.  This is a trick we use to transform a data-parallel model
      to a serial-processed model.
      
      *SummaryGraph uses a copy of the model after the model is processed by
      distiller.make_non_parallel_copy() which renders the model non-data-parallel.
      
      *The same goes for model_performance_summary()
      
      *Model inputs are explicitly placed on the Cuda device, since now all models are
      Executed on the CPU.  Previously, if a model was not created using
      nn.DataParallel, then the model was not explicitly placed on the Cuda device.
      
      *The logic in distiller.CompressionScheduler that attempted to load a
      model parallel model and process it serially, or load a serial model and
      process it data-parallel, was removed.  This removes a lot of fuzziness and makes
      the code more robust: we do not needlessly try to be heroes.
      
      * model summaries - remove pytorch 0.4 warning
      
      * create_model: remove redundant .cuda() call
      
      * Tests: support both parallel and serial tests
      Unverified
      b64be690
  16. Jun 30, 2018
    • Neta Zmora's avatar
    • Neta Zmora's avatar
      Bug fix: add support for thinning the optimizer · b21f449b
      Neta Zmora authored
      You no longer need to use —momentum=0 when removing structures
      dynamically.
      The SGD momentum update (velocity) is dependent on the weights, which
      PyTorch optimizers cache internally.  This caching is not a problem for
      filter/channel removal (thinning) because although we dynamically
      change the shapes of the weights tensors, we don’t change the weights
      tensors themselves.
      PyTorch’s SGD creates tensors to store the momentum updates, and these
      tensors have the same shape as the weights tensors.  When we change the
      weights tensors, we need to make the appropriate changes in the Optimizer,
      or disable the momentum.
      We added a new function - thinning.optimizer_thinning() - to do this.
      This function is brittle as it is tested only on optim.SGD and relies on the
      internal representation of the SGD optimizer, which can change w/o notice.
      For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
      Which also depend the shape of the weight tensors.
      We needed to pass the Optimizer instance to Thinning policies
      (ChannelRemover, FilterRemover) via the callbacks, which required us
      to change the callback interface.
      In the future we plan a bigger change to the callback API, to allow
      passing of arbitrary context from the training environment to Distiller.
      
      Also in this commit:
      * compress_classifier.py had special handling for resnet layer-removal, which
      is used in examples/ssl/ssl_4D-removal_training.yaml.
      This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
      Removing support for layer-removal.
      * Added to the tests invocation of forward and backward passes over a model.
      This tests more of the real flows, which use the optimizer and construct
      gradient tensors.
      * Added a test of a special case of convolution filter-pruning which occurs
      when the next layer is fully-connected (linear)
      b21f449b
  17. Jun 21, 2018
  18. Jun 19, 2018
    • Guy Jacob's avatar
      Make PNG summary compatible with latest SummaryGraph class changes (#7) · 9e57219e
      Guy Jacob authored
      * Modify 'create_png' to use the correct data structures (dicts instead
        lists, etc.)
      * Handle case where an op was called not from a module. This relates to:
        * ONNX->"User-Friendly" name conversion to account for cases where
        * Detection of existing op with same name
        In both cases use the ONNX op type in addition to the op name
      * Return an "empty" shape instead of None when ONNX couldn't infer
        a parameter's shape
      * Expose option of PNG summary with parameters to user
      Unverified
      9e57219e
  19. May 17, 2018
    • Neta Zmora's avatar
      Fix system tests failure · a7ed8cad
      Neta Zmora authored
      The latest changes to the logger caused the CI tests to fail,
      because test assumes that the logging.conf file is present in the
      same directory as the sample application script.
      The sample application used cwd() instead, and did not find the
      log configuration file.
      a7ed8cad
  20. May 16, 2018
    • Neta Zmora's avatar
      refactoring: move config_pylogger out of the sample app · 792e9e39
      Neta Zmora authored
      Soon we will be reusing this function in other sample apps, so let's
      move it to app_utils.
      792e9e39
    • Neta Zmora's avatar
      Check if correct version of PyTorch is installed. · ba653d9a
      Neta Zmora authored
      The 'master' branch now uses PyTorch 0.4, which has API changes that
      are not backward compatible with PyTorch 0.3.
      
      After we've upgraded Distiller's internal implementation to be
      compatible with PyTorch 0.4, we've added a check that you are using
      the correct PyTorch version.
      
      Note that we only perform this check in the sample image classifier
      compression application.
      ba653d9a
    • Neta Zmora's avatar
      refactoring: move the message logger setup out of main() · 6e8b0fd6
      Neta Zmora authored
      Eventually we will want to use this code in other sample applications,
      so let's move the logger configuration code to a separate function.
      
      There's a bit of ugly hacking in this current implementation because
      I've added variable members to logging.logger.  These are actaully
      config-once variables that convey the logging directory and filename.
      I did not want to add more names to the global namespace, so I hacked
      a temporary solution in which logging.logger is acts as a conveyor and
      private namespace.  We'll get that cleaned up as we do more refactoring.
      6e8b0fd6
    • Neta Zmora's avatar
      New summary option: print modules names · 6a940466
      Neta Zmora authored
      This is a niche feature, which lets you print the names of the modules
      in a model, from the command-line.
      Non-leaf nodes are excluded from this list.  Other caveats are documented
      in the code.
      6a940466
    • Neta Zmora's avatar
      PNG summary: default to non-parallel graphs · 53b74ca6
      Neta Zmora authored
      Data parallel models may execute faster on multiple GPUs, but rendering
      them creates visually complex and illegible graphs.
      Therefore, when creating models for a PNG summary, we opt to use
      non-parallel models.
      53b74ca6
    • Neta Zmora's avatar
      7ce11aee
  21. May 14, 2018
  22. Apr 24, 2018
Loading