Skip to content
Snippets Groups Projects
  1. Aug 27, 2018
  2. Aug 09, 2018
    • Guy Jacob's avatar
      Generalize the loss value returned from before_backward_pass callbacks (#38) · a43b9f10
      Guy Jacob authored
      * Instead of a single additive value (which so far represented only the
        regularizer loss), callbacks return a new overall loss
      * Policy callbacks also return the individual loss components used to
        calculate the new overall loss.
      * Add boolean flag to the Scheduler's callback so applications can choose
        if they want to get individual loss components, or just the new overall
        loss
      * In compress_classifier.py, log the individual loss components
      * Add test for the loss-from-callback flow
      Unverified
      a43b9f10
  3. Aug 07, 2018
  4. Jul 31, 2018
  5. Jul 25, 2018
  6. Jul 22, 2018
  7. Jul 21, 2018
  8. Jul 19, 2018
  9. Jul 17, 2018
    • Guy Jacob's avatar
    • Guy Jacob's avatar
      Quantizer tests, fixes and docs update · 6b166cec
      Guy Jacob authored
      * Add Quantizer unit tests
      * Require 'bits_overrides' to be OrderedDict to support overlapping
        patterns in a predictable manner + update documentation to reflect this
      * Quantizer class cleanup
        * Use "public" nn.Module APIs instead of protected attributes
        * Call the builtins set/get/delattr instead of the class special methods
          (__***__)
        * Fix issues reported in #24
      * Bug in RangeLinearQuantParamLayerWrapper - add explicit override of
        pre_quantized_forward accpeting single input (#15)
      * Add DoReFa test to full_flow_tests
      6b166cec
  10. Jul 15, 2018
    • Neta Zmora's avatar
      model_summaries.py: remove "experimental" warning · e7dd1a56
      Neta Zmora authored
      This is now tested and supported when using CNNs and PyTorch 0.4
      Unverified
      e7dd1a56
    • Neta Zmora's avatar
      Thinning: bug fixes · b48908c3
      Neta Zmora authored
      There are two different “namespaces” referring to module names:
      normalized and de-normalized.
      Normalized module names are module names that have the same
      format for both data-parallel and data-serial models.
      De-normalized module names are the “raw” PyTorch module names
      that reflect the full model graph.  So if there is a container module
      such as nn.DataParallel in the model, then a sub-module’s name
      will have the “module” substring somewhere in it.
      
      SummaryGraph operates by converting the PyTorch to ONNX, and
      I’ve have issues handling nn.DataParallel in this process.
      Therefore, SummaryGraph uses only normalized names internally.
      
      PruningRecipe, on the other hand, uses de-normalized names
      because it needs to operate on the model itself.
      
      This is a sticky situation that can create really annoying bugs and makes
      for some ugly code.  Nonetheless, this is the best I can do right now,
      and I’ll probably revisit this soon to make it nicer.
      For now, I’m pushing this commit that fixes the distinction between the
      two namespaces, and fixes related bugs – in the hope that it is not too
      brittle.
      
      append_module_directive – now uses denormalize_module_name to
      ensure recipe module names are denormalized.
      
      append_param_directive – because we are dealing with parameters,
      I can’t use denormalize_module_name as easily as in append_module_directive.
      The clean solution is kept for later :-(
      b48908c3
    • Neta Zmora's avatar
      f5791422
    • Neta Zmora's avatar
      apputils/model_summaries.py: cleanup PEP8 warnings · 7b3ab5ef
      Neta Zmora authored
      Also add a warnning when swe can't find a node whose
      predecessors we're looking for.
      7b3ab5ef
  11. Jul 13, 2018
    • Neta Zmora's avatar
    • Neta Zmora's avatar
      ADC (Automatic Deep Compression) example + features, tests, bug fixes (#28) · 718f777b
      Neta Zmora authored
      This is a merge of the ADC branch and master.
      ADC (using a DDPG RL agent to compress image classifiers) is still WiP and requires
      An unreleased version of Coach (https://github.com/NervanaSystems/coach).
      
      Small features in this commit:
      -Added model_find_module() - find module object given its name
      - Add channel ranking and pruning: pruning/ranked_structures_pruner.py
      - Add a CIFAR10 VGG16 model: models/cifar10/vgg_cifar.py
      - Thinning: change the level of some log messages – some of the messages were
      moved to ‘debug’ level because they are not usually interesting.
      - Add a function to print nicely formatted integers - distiller/utils.py
      - Sensitivity analysis for channels-removal
      - compress_classifier.py – handle keyboard interrupts
      - compress_classifier.py – fix re-raise of exceptions, so they maintain call-stack
      
      -Added tests:
      -- test_summarygraph.py: test_simplenet() - Added a regression test to target a bug that occurs when taking the predecessor of the first node in a graph
      -- test_ranking.py - test_ch_ranking, test_ranked_channel_pruning
      -- test_model_summary.py - test_png_generation, test_summary (sparsity/ compute/model/modules)
      
      - Bug fixes in this commit:
      -- Thinning bug fix: handle zero-sized 'indices' tensor
      During the thinning process, the 'indices' tensor can become zero-sized,
      and will have an undefiend length. Therefore, we need to check for this
      situation when assessing the number of elements in 'indices'
      -- Language model: adjust main.py to new distiller.model_summary API
      Unverified
      718f777b
  12. Jul 11, 2018
    • Neta Zmora's avatar
      load_checkpoint: replace exit() with exception and add test · 2bb90a9a
      Neta Zmora authored
      - Raise IOError instead of crude exit() when file is not found in the file-system
      - Test that the correct exception is raised when opening a non-existent
      checkpoint file
      2bb90a9a
    • Neta Zmora's avatar
      Extend pruning tests to parallel models · e3e41ba6
      Neta Zmora authored
      e3e41ba6
    • Neta Zmora's avatar
      More robust handling of data-parallel/serial graphs (#27) · b64be690
      Neta Zmora authored
      Remove the complicated logic trying to handle data-parallel models as
      serially-processed models, and vice versa.
      
      *Function distiller.utils.make_non_parallel_copy() does the heavy lifting of
      replacing  all instances of nn.DataParallel in a model with instances of
      DoNothingModuleWrapper.
      The DoNothingModuleWrapper wrapper does nothing but forward to the
      wrapped module.  This is a trick we use to transform a data-parallel model
      to a serial-processed model.
      
      *SummaryGraph uses a copy of the model after the model is processed by
      distiller.make_non_parallel_copy() which renders the model non-data-parallel.
      
      *The same goes for model_performance_summary()
      
      *Model inputs are explicitly placed on the Cuda device, since now all models are
      Executed on the CPU.  Previously, if a model was not created using
      nn.DataParallel, then the model was not explicitly placed on the Cuda device.
      
      *The logic in distiller.CompressionScheduler that attempted to load a
      model parallel model and process it serially, or load a serial model and
      process it data-parallel, was removed.  This removes a lot of fuzziness and makes
      the code more robust: we do not needlessly try to be heroes.
      
      * model summaries - remove pytorch 0.4 warning
      
      * create_model: remove redundant .cuda() call
      
      * Tests: support both parallel and serial tests
      Unverified
      b64be690
  13. Jul 09, 2018
    • Neta Zmora's avatar
      Fix issue #26 · 51a7df35
      Neta Zmora authored
      The checkpoint file:
      examples/ssl/checkpoints/checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar
      did not contain the "thinning recipe" while the weight tensor stored within the
      checkpoint file have already been shrunk/thinned and this caused a mismatch.
      
      PyTorch models are defined in code.  This includes the network architecture and
      connectivity (which layers are used and what is the forward path), but also
      the sizes for the parameter tensors and input/outputs.
      When the model is created the parameter tensors are also created, as defined
      or inferred from the code.
      When a checkpoint is loaded, they parameter tensors are read from the checkpoint and
      copied to the model's tensors.  Therefore, the tensors in the checkpoint and
      in the model must have the same shape.  If a model has been "thinned" and saved to
      a checkpoint, then the checkpoint tensors are "smaller" than the ones defined by
      the model.  A "thinning recipe" is used to make changes to the model before copying
      the tensors from the checkpoint.
      In this case, the "thinning recipe" was missing.
      51a7df35
  14. Jul 08, 2018
    • Neta Zmora's avatar
      Bug fix in connectivity_summary; extend the API of create_png() · af4bf3dc
      Neta Zmora authored
      *connectivity_summary() does not use SummaryGraph correctly:
       Recently we changed the internal representation of SummaryGraph.ops, but
       connectivity_summary() and connectivity_summary_verbose() were not updated.
       Fixed that.
      
      *Extend the API of create_png():
       Add to the signature of create_png() and create_pydot_graph() rankdir and
       External styles.  These are explained in the docstrings.
      
      *Added documentation to the PNG drawing functions
      
      *Added tests to catch trivial connectivity_summary() bugs
      af4bf3dc
    • Robert Muchsel's avatar
      c14efaa9
  15. Jul 05, 2018
  16. Jul 03, 2018
  17. Jul 01, 2018
  18. Jun 30, 2018
    • Neta Zmora's avatar
    • Neta Zmora's avatar
      Bug fix: add support for thinning the optimizer · b21f449b
      Neta Zmora authored
      You no longer need to use —momentum=0 when removing structures
      dynamically.
      The SGD momentum update (velocity) is dependent on the weights, which
      PyTorch optimizers cache internally.  This caching is not a problem for
      filter/channel removal (thinning) because although we dynamically
      change the shapes of the weights tensors, we don’t change the weights
      tensors themselves.
      PyTorch’s SGD creates tensors to store the momentum updates, and these
      tensors have the same shape as the weights tensors.  When we change the
      weights tensors, we need to make the appropriate changes in the Optimizer,
      or disable the momentum.
      We added a new function - thinning.optimizer_thinning() - to do this.
      This function is brittle as it is tested only on optim.SGD and relies on the
      internal representation of the SGD optimizer, which can change w/o notice.
      For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq']
      Which also depend the shape of the weight tensors.
      We needed to pass the Optimizer instance to Thinning policies
      (ChannelRemover, FilterRemover) via the callbacks, which required us
      to change the callback interface.
      In the future we plan a bigger change to the callback API, to allow
      passing of arbitrary context from the training environment to Distiller.
      
      Also in this commit:
      * compress_classifier.py had special handling for resnet layer-removal, which
      is used in examples/ssl/ssl_4D-removal_training.yaml.
      This is a brittle and ugly hack.  Until we have a more elegant solution, I’m
      Removing support for layer-removal.
      * Added to the tests invocation of forward and backward passes over a model.
      This tests more of the real flows, which use the optimizer and construct
      gradient tensors.
      * Added a test of a special case of convolution filter-pruning which occurs
      when the next layer is fully-connected (linear)
      b21f449b
  19. Jun 29, 2018
Loading