Skip to content
Snippets Groups Projects
  1. Aug 06, 2019
    • Neta Zmora's avatar
      AMC and other refactoring - large merge (#339) · 02054da1
      Neta Zmora authored
      *An implementation of AMC (the previous implementation
       code has moved to a new location under 
      /distiller/examples/auto_compression/amc.  AMC is aligned
      with the ‘master’ branch of Coach.
      *compress_classifier.py is refactored.  The base code moved
      to /distiller/apputils/image_classifier.py.  Further refactoring
      will follow.
      We want to provide a simple and small API to the basic features of
      a classifier-compression application.
      This will help applications that want to use the make features of a
      classifier-compression application, without the standard training
      regiment.
      AMC is one example of a stand-alone application that needs to leverage
      the capabilities of a classifier-compression application, but is currently
      coupled to `compress_classifier.py`.
      `multi-finetune.py` is another example.
      * ranked_structures_pruner.py:
      ** Added support for grouping channels/filters
      Sometimes we want to prune a group of structures: e.g. groups of
      8-channels.  This feature does not force the groups to be adjacent,
      so it is more like a set of structures.  E.g. in the case of pruning
      channels from a 64-channels convolution, grouped by 8 channels, we 
      will prune exactly one of 0/8/16/24/32/40/48/56 channels.  I.e. 
      always a multiple of 8-channels, excluding the set of all 64 channels.
      ** Added FMReconstructionChannelPruner – this is channel
      pruning using L1-magnitude to rank and select channels to
      remove, and feature-map reconstruction to improve the
      resilience to the pruning.
      * Added a script to run multiple instances of an 
      experiment, in different processes:
       examples/classifier_compression/multi-run.py
      * Set the seed value even when not specified by the command-line
      arguments, so that we can try and recreate the session.
      * Added pruning ranking noise -
      Ranking noise introduces Gaussian noise when ranking channels/filters
      using Lp-norm.  The noise is introduced using the epsilon-greedy
      methodology, where ranking using exact Lp-norm is considered greedy.
      * Added configurable rounding of pruning level: choose whether to 
      Round up/down when rounding the number of structures to prune 
      (rounding is always to an integer).  
      Unverified
      02054da1
  2. Apr 01, 2019
    • Bar's avatar
      Load optimizer from checkpoint (BREAKING - see details) (#182) · 992291cf
      Bar authored
      Load optimizer from checkpoint (BREAKING - see details) (#182)
      
      * Fixes issues #70, #145 and replaces PR #74
      * checkpoint.py
        * save_checkpoint will now save the optimizer type in addition to
          its state
        * load_checkpoint will now instantiate an optimizer based on the
          saved type and load its state
      * config.py: file/dict_config now accept the resumed epoch to pass to
        LR schedulers
      * policy.py: LRPolicy now passes the current epoch to the LR scheduler
      * Classifier compression sample
        * New flag '--resume-from' for properly resuming a saved training
          session, inc. optimizer state and epoch #
        * Flag '--reset-optimizer' added to allow discarding of a loaded
          optimizer.
        * BREAKING:
          * Previous flag '--resume' is deprecated and is mapped to
            '--resume-from' + '--reset-optimizer'. 
          * But, old resuming behavior had an inconsistency where the epoch
            count would continue from the saved epoch, but the LR scheduler
            was setup as if we were starting from epoch 0.
          * Using '--resume-from' + '--reset-optimizer' now will simply
            RESET the epoch count to 0 for the whole environment.
          * This means that scheduling configurations (in YAML or code)
            which assumed use of '--resume' might need to be changed to
            reflect the fact that the epoch count now starts from 0
          * All relevant YAML files under 'examples' modified to reflect
            this change
      * Initial support for ReduceLROnPlateau (#161):
        * Allow passing **kwargs to policies via the scheduler
        * Image classification now passes the validation loss to the
          scheduler, to be used yo ReduceLROnPlateau
        * The current implementation is experimental and subject to change
      992291cf
  3. Sep 26, 2018
    • Neta Zmora's avatar
      Attention-Based Guided Structured Sparsity (GSS) (#51) · 65919dc0
      Neta Zmora authored
      * Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning.
          - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance.
          - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups).
          - When tested on ResNet 20 GSS did not show any improvement over SSL
      
      * Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks)
      
      * Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR
      
      * Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.
      Unverified
      65919dc0
  4. Sep 20, 2018
  5. Jul 22, 2018
  6. Jul 09, 2018
    • Neta Zmora's avatar
      Fix issue #26 · 51a7df35
      Neta Zmora authored
      The checkpoint file:
      examples/ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar
      did not contain the "thinning recipe" while the weight tensor stored within the
      checkpoint file have already been shrunk/thinned and this caused a mismatch.
      
      PyTorch models are defined in code.  This includes the network architecture and
      connectivity (which layers are used and what is the forward path), but also
      the sizes for the parameter tensors and input/outputs.
      When the model is created the parameter tensors are also created, as defined
      or inferred from the code.
      When a checkpoint is loaded, they parameter tensors are read from the checkpoint and
      copied to the model's tensors.  Therefore, the tensors in the checkpoint and
      in the model must have the same shape.  If a model has been "thinned" and saved to
      a checkpoint, then the checkpoint tensors are "smaller" than the ones defined by
      the model.  A "thinning recipe" is used to make changes to the model before copying
      the tensors from the checkpoint.
      In this case, the "thinning recipe" was missing.
      51a7df35
  7. Jun 30, 2018
    • Neta Zmora's avatar
      Bug fix: add support for thinning the optimizer · b21f449b
      Neta Zmora authored
      You no longer need to use —momentum=0 when removing structures
      dynamically.
      The SGD momentum update (velocity) is dependent on the weights, which
      PyTorch optimizers cache internally.  This caching is not a problem for
      filter/channel removal (thinning) because although we dynamically
      change the shapes of the weights tensors, we don’t change the weights
      tensors themselves.
      PyTorch’s SGD creates tensors to store the momentum updates, and these
      tensors have the same shape as the weights tensors.  When we change the
      weights tensors, we need to make the appropriate changes in the Optimizer,
      or disable the momentum.
      We added a new function - thinning.optimizer_thinning() - to do this.
      This function is brittle as it is tested only on optim.SGD and relies on the
      internal representation of the SGD optimizer, which can change w/o notice.
      For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
      Which also depend the shape of the weight tensors.
      We needed to pass the Optimizer instance to Thinning policies
      (ChannelRemover, FilterRemover) via the callbacks, which required us
      to change the callback interface.
      In the future we plan a bigger change to the callback API, to allow
      passing of arbitrary context from the training environment to Distiller.
      
      Also in this commit:
      * compress_classifier.py had special handling for resnet layer-removal, which
      is used in examples/ssl/ssl_4D-removal_training.yaml.
      This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
      Removing support for layer-removal.
      * Added to the tests invocation of forward and backward passes over a model.
      This tests more of the real flows, which use the optimizer and construct
      gradient tensors.
      * Added a test of a special case of convolution filter-pruning which occurs
      when the next layer is fully-connected (linear)
      b21f449b
  8. Jun 10, 2018
    • Neta Zmora's avatar
      Thinning (#6) · 42650340
      Neta Zmora authored
      * Large update containing new thinning algorithm.
      
      Thinning a model is the process of taking a dense network architecture with a parameter model that
      has structure-sparsity (filters or channels) in the weights tensors of convolution layers, and making changes in the network architecture and parameters, in order to completely remove the structures.
      The new architecture is smaller (condensed), with less channels and filters in some of the convolution layers.  Linear and BatchNormalization layers are also adjusted as required.
      
      To perform thinning, we create a SummaryGraph (‘sgraph’) of our model.  We use the ‘sgraph’ to infer the
      data-dependency between the modules in the PyTorch network.  This entire process is not trivial and will be
      documented in a different place.
      
      Large refactoring of SummaryGraph to support the new thinning requirement for traversing successors and predecessors.
      - Operations (ops) are now stored in a dictionary, so that they can be accessed quickly by name.
      - Refactor Operation construction code
      - Added support for search a node’s predecessors and successors.  You can search for all predecessors/successors by depth, or by type.
      - create_png now supports an option to display the parameter nodes
      
      Updated schedules with new thinning syntax.
      
      * Thinning: support iterative thinning of models
      
      THere's a caveat with this commit: when using this code you will
      need to train with SGD momentum=0.
      The momentum update is dependent on the weights, and because we
      dynamically change the weights shapes, we need to either make the
      apporpriate changes in the Optimizer, or disable the momentum.
      For now, we disable the momentum
      
      * Thinning: move the application of FilterRemover to on_minibatch_begin
      
      * Thinning: fix syntax error
      
      * Word-level language model compression
      
      Added an implementation of Baidu’s RNN pruning scheme:
      Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017).
          Exploring Sparsity in Recurrent Neural Networks.
          (https://arxiv.org/abs/1704.05119)
      
      Added an example of word-level language model compression.
      The language model is based on PyTorch’s example:
      https://github.com/pytorch/examples/tree/master/word_language_model
      
      Added an AGP pruning schedule and RNN pruning schedule to demonstrate
      compression of the language model.
      
      * thinning: remove dead code
      
      * remove resnet18 filter pruning since the scheduler script is incomplete
      
      * thinning: fix indentation error
      
      * thinning: remove dead code
      
      * thinning: updated resnet20-CIFAR filter-removsal reference checkpoints
      
      * thinning: updated resnet20-CIFAR filter-removal reference schedules
      
      These are for use with the new thinning scheudle algorithm
      Unverified
      42650340
  9. May 01, 2018
  10. Apr 28, 2018
  11. Apr 24, 2018
Loading