Skip to content
Snippets Groups Projects
  1. Oct 22, 2018
    • Neta Zmora's avatar
      Activation statistics collection (#61) · 54a5867e
      Neta Zmora authored
      Activation statistics can be leveraged to make pruning and quantization decisions, and so
      We added support to collect these data.
      - Two types of activation statistics are supported: summary statistics, and detailed records 
      per activation.
      Currently we support the following summaries: 
      - Average activation sparsity, per layer
      - Average L1-norm for each activation channel, per layer
      - Average sparsity for each activation channel, per layer
      
      For the detailed records we collect some statistics per activation and store it in a record.  
      Using this collection method generates more detailed data, but consumes more time, so
      Beware.
      
      * You can collect activation data for the different training phases: training/validation/test.
      * You can access the data directly from each module that you chose to collect stats for.  
      * You can also create an Excel workbook with the stats.
      
      To demonstrate use of activation collection we added a sample schedule which prunes 
      weight filters by the activation APoZ according to:
      "Network Trimming: A Data-Driven Neuron Pruning Approach towards 
      Efficient Deep Architectures",
      Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang, ICLR 2016
      https://arxiv.org/abs/1607.03250
      
      We also refactored the AGP code (AutomatedGradualPruner) to support structure pruning,
      and specifically we separated the AGP schedule from the filter pruning criterion.  We added
      examples of ranking filter importance based on activation APoZ (ActivationAPoZRankedFilterPruner),
      random (RandomRankedFilterPruner), filter gradients (GradientRankedFilterPruner), 
      and filter L1-norm (L1RankedStructureParameterPruner)
      54a5867e
    • Neta Zmora's avatar
  2. Oct 18, 2018
    • Neta Zmora's avatar
      Bug fix: exporting Alexnet and VGG models to ONNX · b40dff5e
      Neta Zmora authored
      ONNX export in PyTorch doesn't know how to handle DataParallel
      layers, so we need to make sure that we remove all instances
      of nn.DataParallel from the model before exporting it.
      
      The previous ONNX implementation forgot to deal with the case
      of DataParallel layers that do not wrap the entire model (as
      in VGG, where only the feature-extractor layers are data-parallel).
      b40dff5e
    • Neta Zmora's avatar
      Bug fix: remove softmax layer from model loading code · 0a8a3b31
      Neta Zmora authored
      We should only add softmax when we explicitly require it (as when
      exporting to ONNX), because CrossEntropyLoss implicitly computes
      softmax on the logits it receives as input.
      
      This cade was left there by mistake and should have never been
      pushed to git.
      0a8a3b31
  3. Oct 13, 2018
  4. Oct 11, 2018
    • Neta Zmora's avatar
      Fix Issue #53 (#55) · 608af2b4
      Neta Zmora authored
      When using a schedule with epochs that have nothing scheduled for them, apply_mask() is not invoked at the end of mini-batches, and pruned weights might be unmasked by the optimizer weight updates.
      
      See explanation in issue #53 discussion
      608af2b4
  5. Oct 04, 2018
  6. Oct 03, 2018
  7. Oct 01, 2018
  8. Sep 29, 2018
  9. Sep 26, 2018
    • Neta Zmora's avatar
      Attention-Based Guided Structured Sparsity (GSS) (#51) · 65919dc0
      Neta Zmora authored
      * Added GSS ("Attention-Based Guided Structured Sparsity of Deep Neural Networks") and an example for ResNet20 channel pruning.
          - The idea is to regularize the variance of the distribution of the parameter structures. Some structures will zero completely and the rest should have a high value leading to a high variance.
          - A new regularizer class, GroupVarianceRegularizer, is used to regularize the group variance (effectively rewarding the loss function for high variance between the groups).
          - When tested on ResNet 20 GSS did not show any improvement over SSL
      
      * Added sample of filter pruning for ResNet20 CIFAR using SSL (Learning Structured Sparsity in Deep Neural Networks)
      
      * Added an example of pruning 45% of the compute (1.8x MAC reduction), while suffering 0.8% accuracy loss on ResNet20 CIFAR
      
      * Added a ResNet50 ImageNet example of L1-Magnitude fine-grained pruning, using an AGP schedule: 46% sparsity with a 0.6% accuracy increase. This is an example of using pruning used as a regularizer.
      65919dc0
  10. Sep 21, 2018
  11. Sep 20, 2018
  12. Sep 16, 2018
    • Neta Zmora's avatar
      Clean up PyTorch 0.3 compatibility code (#46) · e749ea62
      Neta Zmora authored
      * Clean up PyTorch 0.3 compatibility code
      We don't need this anymore and PyTorch 1.0 is just around the corner.
      
      * explicitly place the inputs tensor on the GPU(s)
      e749ea62
    • Neta Zmora's avatar
      A temporary fix for issue #36 (#48) · 5d3d6d8d
      Neta Zmora authored
      * A temporary fix for issue 36
      
      The thinning code assumes that the sgraph it is using
      is not data-parallel, because it (currently) accesses the
      layer-name keys using a "normalized" name ("module." is removed).
      
      The bug is that in thinning.py#L73 we create a data_parallel=True
      model; and then give it to sgraph.
      But in other places thinning code uses "normalized" keys.  For
      example in thinning.py#L264.
      
      The temporary fix configures data_parallel=False in thinning.py#L73.
      
      A long term solution should have SummaryGraph know how to handle
      both parallel and not-parallel models.  This can be done by having
      SummaryGraph convert layer-names it receives in the API to
      data_parallel=False using normalize_layer_name.  When returning
      results, use the de-normalized format.
      
      * Fix the documentation error from issue 36
      * Move some logs to debug and show in logging.conf how to enable DEBUG logs.
      5d3d6d8d
    • Neta Zmora's avatar
  13. Sep 03, 2018
  14. Aug 29, 2018
  15. Aug 27, 2018
  16. Aug 09, 2018
    • Guy Jacob's avatar
      Generalize the loss value returned from before_backward_pass callbacks (#38) · a43b9f10
      Guy Jacob authored
      * Instead of a single additive value (which so far represented only the
        regularizer loss), callbacks return a new overall loss
      * Policy callbacks also return the individual loss components used to
        calculate the new overall loss.
      * Add boolean flag to the Scheduler's callback so applications can choose
        if they want to get individual loss components, or just the new overall
        loss
      * In compress_classifier.py, log the individual loss components
      * Add test for the loss-from-callback flow
      a43b9f10
  17. Aug 07, 2018
  18. Jul 31, 2018
  19. Jul 25, 2018
  20. Jul 22, 2018
Loading