Skip to content
Snippets Groups Projects
  1. Nov 02, 2018
    • Neta Zmora's avatar
      Update README · 1757f213
      Neta Zmora authored
      Changed the description of the feature-set.
      Updating the README a little bit since a lot has changed since we released 6 months ago.  Still a lot to add/remove/change.
      Unverified
      1757f213
  2. Nov 01, 2018
  3. Oct 31, 2018
  4. Oct 29, 2018
  5. Oct 27, 2018
  6. Oct 26, 2018
  7. Oct 25, 2018
    • Neta Zmora's avatar
      Pruning: remove unnecessary backward hook used for masking gradients (#63) · 9c701c1c
      Neta Zmora authored
      After commit f396c34a362731c765370d368877c2ca367ad651, we
      Now always apply a pruning mask at the end of mini-batches (this is
      because we understand that weights may be updated by SGD+momentum,
      even for weights that are masked. Therefore, there is no need for masking
      the gradients: we always mask the weights at the end of the mini-batch.
      
      See issue #53 for more details.
      Unverified
      9c701c1c
  8. Oct 23, 2018
  9. Oct 22, 2018
    • Neta Zmora's avatar
    • Neta Zmora's avatar
      Activation statistics collection (#61) · 54a5867e
      Neta Zmora authored
      Activation statistics can be leveraged to make pruning and quantization decisions, and so
      We added support to collect these data.
      - Two types of activation statistics are supported: summary statistics, and detailed records 
      per activation.
      Currently we support the following summaries: 
      - Average activation sparsity, per layer
      - Average L1-norm for each activation channel, per layer
      - Average sparsity for each activation channel, per layer
      
      For the detailed records we collect some statistics per activation and store it in a record.  
      Using this collection method generates more detailed data, but consumes more time, so
      Beware.
      
      * You can collect activation data for the different training phases: training/validation/test.
      * You can access the data directly from each module that you chose to collect stats for.  
      * You can also create an Excel workbook with the stats.
      
      To demonstrate use of activation collection we added a sample schedule which prunes 
      weight filters by the activation APoZ according to:
      "Network Trimming: A Data-Driven Neuron Pruning Approach towards 
      Efficient Deep Architectures",
      Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
      https://arxiv.org/abs/1607.03250
      
      We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
      and specifically we separated the AGP schedule from the filter pruning criterion.  We added
      examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
      random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
      and filter L1-norm (L1RankedStructureParameterPruner)
      Unverified
      54a5867e
    • Neta Zmora's avatar
  10. Oct 18, 2018
    • Neta Zmora's avatar
      Bug fix: exporting Alexnet and VGG models to ONNX · b40dff5e
      Neta Zmora authored
      ONNX export in PyTorch doesn't know how to handle DataParallel
      layers, so we need to make sure that we remove all instances
      of nn.DataParallel from the model before exporting it.
      
      The previous ONNX implementation forgot to deal with the case
      of DataParallel layers that do not wrap the entire model (as
      in VGG, where only the feature-extractor layers are data-parallel).
      b40dff5e
    • Neta Zmora's avatar
      Bug fix: remove softmax layer from model loading code · 0a8a3b31
      Neta Zmora authored
      We should only add softmax when we explicitly require it (as when
      exporting to ONNX), because CrossEntropyLoss implicitly computes
      softmax on the logits it receives as input.
      
      This cade was left there by mistake and should have never been
      pushed to git.
      0a8a3b31
  11. Oct 13, 2018
  12. Oct 11, 2018
    • Neta Zmora's avatar
      Fix Issue #53 (#55) · 608af2b4
      Neta Zmora authored
      When using a schedule with epochs that have nothing scheduled for them, apply_mask() is not invoked at the end of mini-batches, and pruned weights might be unmasked by the optimizer weight updates.
      
      See explanation in issue #53 discussion
      Unverified
      608af2b4
  13. Oct 04, 2018
  14. Oct 03, 2018
  15. Oct 01, 2018
  16. Sep 29, 2018
  17. Sep 26, 2018
    • Neta Zmora's avatar
      Attention-Based Guided Structured Sparsity (GSS) (#51) · 65919dc0
      Neta Zmora authored
      * Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning.
          - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance.
          - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups).
          - When tested on ResNet 20 GSS did not show any improvement over SSL
      
      * Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks)
      
      * Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR
      
      * Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.
      Unverified
      65919dc0
  18. Sep 21, 2018
  19. Sep 20, 2018
  20. Sep 16, 2018
    • Neta Zmora's avatar
      Clean up PyTorch 0.3 compatibility code (#46) · e749ea62
      Neta Zmora authored
      * Clean up PyTorch 0.3 compatibility code
      We don't need this anymore and PyTorch 1.0 is just around the corner.
      
      * explicitly place the inputs tensor on the GPU(s)
      Unverified
      e749ea62
    • Neta Zmora's avatar
      A temporary fix for issue #36 (#48) · 5d3d6d8d
      Neta Zmora authored
      * A temporary fix for issue 36
      
      The thinning code assumes that the sgraph it is using
      is not data-parallel, because it (currently) accesses the
      layer-name keys using a "normalized" name ("module." is removed).
      
      The bug is that in thinning.py#L73 we create a data_parallel=True
      model; and then give it to sgraph.
      But in other places thinning code uses "normalized" keys.  For
      example in thinning.py#L264.
      
      The temporary fix configures data_parallel=False in thinning.py#L73.
      
      A long term solution should have SummaryGraph know how to handle
      both parallel and not-parallel models.  This can be done by having
      SummaryGraph convert layer-names it receives in the API to
      data_parallel=False using normalize_layer_name.  When returning
      results, use the de-normalized format.
      
      * Fix the documentation error from issue 36
      * Move some logs to debug and show in logging.conf how to enable DEBUG logs.
      Unverified
      5d3d6d8d
Loading