Skip to content
Snippets Groups Projects
  1. Feb 13, 2019
  2. Feb 12, 2019
    • Neta Zmora's avatar
      CPU support: fix the case of loading a thinned GPU-model on the CPU · ba05f6cf
      Neta Zmora authored
      This commit fixes (and adds a test) for the case that we with to load
      a thinned GPU checkpoint onto the CPU.
      ba05f6cf
    • Guy Jacob's avatar
    • Neta Zmora's avatar
      Fix issue #148 + refactor load_checkpoint.py (#153) · 1210f412
      Neta Zmora authored
      The root-cause of issue #148 is that DataParallel modules cannot execute on the CPU,
      on machines that have both CPUs and GPUs.
      Therefore, we don’t use DataParallel for models loaded for the CPUs, but we do wrap
      the models with DataParallel when loaded on the GPUs (to make them run faster).
      The names of module keys saved in a checkpoint file depend if the modules are wrapped
      by a DataParallel module or not.  So loading a checkpoint that ran on the GPU onto a
      CPU-model (and vice-versa) will fail on the keys.
      This is all PyTorch and despite the community asking for a fix -
      e.g. https://github.com/pytorch/pytorch/issues/7457 - it is still pending.
      
      This commit contains code to catch key errors when loading a GPU-generated model
      (i.e. with DataParallel) onto a CPU, and convert the names of the keys.
      
      This PR also merges refactoring to load_chackpoint.py done by @barrh, who also added
      a test to further test loading checkpoints.
      Unverified
      1210f412
  3. Feb 11, 2019
  4. Feb 10, 2019
  5. Feb 06, 2019
  6. Jan 31, 2019
  7. Jan 27, 2019
  8. Jan 24, 2019
  9. Jan 23, 2019
  10. Jan 22, 2019
  11. Jan 21, 2019
  12. Jan 16, 2019
    • Bar's avatar
      compress_classifier.py refactoring (#126) · cfbc3798
      Bar authored
      * Support for multi-phase activations logging
      
      Enable logging activation both durning training and validation at
      the same session.
      
      * Refactoring: Move parser to its own file
      
      * Parser is moved from compress_classifier into its own file.
      * Torch version check is moved to precede main() call.
      * Move main definition to the top of the file.
      * Modify parser choices to case-insensitive
      cfbc3798
    • Neta Zmora's avatar
      Fix for CPU support · 4cc0e7d6
      Neta Zmora authored
      4cc0e7d6
  13. Jan 15, 2019
  14. Jan 13, 2019
  15. Jan 10, 2019
    • Gal Novik's avatar
      Enable compute (training/inference) on the CPU · 007b6903
      Gal Novik authored
      In compress_classifier.py we added a new application argument: --cpu
      which you can use to force compute (training/inference) to run on the CPU 
      when you invoke compress_classifier.py on a machine which has Nvidia GPUs.
      
      If your machine lacks Nvidia GPUs, then the compute will now run on the CPU
      (and you do not need the new flag).
      
      Caveat: we did not fully test the CPU support for the code in the Jupyter 
      notebooks.  If you find a bug, we apologize and appreciate your feedback.
      007b6903
  16. Jan 09, 2019
  17. Jan 08, 2019
    • Bar's avatar
      Non-channel/filter block pruning (#119) · b9d53ff8
      Bar authored
      Block pruning: support specifying the block shape from the YAML file
      
      Block pruning refers to pruning 4-D structures of a specific shape.  This 
      is a why it is sometimes called structure-pruning or group-pruning 
      (confusing, I know).
      A specific example of block pruning is filter or channel pruning, which
      have a highly-regular block shape.   
      This commit adds support for pruning blocks/groups/structures
      that have irregular shapes that accelerate inference on a specific 
      hardware platform.  You can read more about the regularity of shapes in
      (Exploring the Regularity of Sparse Structure in
      Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf].
      
      When we want to introduce sparsity in order to reduce the compute load
      of a certain layer, we need to understand how the HW and SW perform
      the layer's operation, and how this operation is vectorized.  Then we can
      induce sparsity to match the vector shape.
      
      For example, Intel AVX-512 are SIMD instructions that apply the same
      instruction (Single Instruction) on a vector of inputs (Multiple
      Data).  The following single instruction performs an element-wise
      multiplication of two 16 32-bit element vectors:
      
           __m256i result = __mm256_mul_epi32(vec_a, vec_b);
      
      If either vec_a or vec_b are partially sparse, we still need to perform
      the multiplication operation and the sparsity does not help reduce the
      cost (power, latency) of computation.  However, if either vec_a or vec_b
      contain only zeros then we can eliminate entirely the instruction.  In this 
      case, we say that we would like to have group sparsity of 16-elements.  
      I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements.
      
      Things are a bit more involved because we also need to understand how the
      software maps layer operations to hardware.  For example, a 3x3
      convolution can be computed as a direct-convolution, as a matrix multiply
      operation, or as a Winograd matrix operation (to name a few ways of
      computation).  These low-level operations are then mapped to SIMD
      instructions.
      
      Finally, the low-level SW needs to support a block-sparse storage-format
      for weight tensors (see for example:
      http://www.netlib.org/linalg/html_templates/node90.html)
      b9d53ff8
  18. Dec 26, 2018
  19. Dec 23, 2018
  20. Dec 19, 2018
Loading