Commits · 7fcf111f8fa36626c832b70f0cb1d2fc956f6ded · llvm / distiller · GitLab

Snippets Groups Projects

Feb 13, 2019

Automatic RL compression + Greedy compression (#151) · 7fcf111f

Neta Zmora authored 6 years ago

Merging the 'amc' branch with 'master'.
This updates the automated compression code in 'master', and adds a greedy filter-pruning algorithm.

7fcf111f

Feb 12, 2019

CPU support: fix the case of loading a thinned GPU-model on the CPU · ba05f6cf
Neta Zmora authored 6 years ago
```
This commit fixes (and adds a test) for the case that we with to load
a thinned GPU checkpoint onto the CPU.
```
ba05f6cf
Post-train quant bugfix: squeeze scale factor shape when quantizing bias · 0ae07549
Guy Jacob authored 6 years ago

0ae07549

Fix issue #148 + refactor load_checkpoint.py (#153) · 1210f412

Neta Zmora authored 6 years ago

The root-cause of issue #148 is that DataParallel modules cannot execute on the CPU,
on machines that have both CPUs and GPUs.
Therefore, we don’t use DataParallel for models loaded for the CPUs, but we do wrap
the models with DataParallel when loaded on the GPUs (to make them run faster).
The names of module keys saved in a checkpoint file depend if the modules are wrapped
by a DataParallel module or not.  So loading a checkpoint that ran on the GPU onto a
CPU-model (and vice-versa) will fail on the keys.
This is all PyTorch and despite the community asking for a fix -
e.g. https://github.com/pytorch/pytorch/issues/7457 - it is still pending.

This commit contains code to catch key errors when loading a GPU-generated model
(i.e. with DataParallel) onto a CPU, and convert the names of the keys.

This PR also merges refactoring to load_chackpoint.py done by @barrh, who also added
a test to further test loading checkpoints.

1210f412

Feb 11, 2019

Update README.md · ed976cff
Neta Zmora authored 6 years ago
```
Added recent Distiller citations
```
Unverified

ed976cff
Update README.md · 59824ac8
Guy Jacob authored 6 years ago

Unverified

59824ac8

Post-train quant based on stats + additional modules quantized (#136) · 28a8ee18

Guy Jacob authored 6 years ago

Summary of changes:
(1) Post-train quantization based on pre-collected statistics
(2) Quantized concat, element-wise addition / multiplication and embeddings
(3) Move post-train quantization command line args out of sample code
(4) Configure post-train quantization from YAML for more fine-grained control

(See PR #136 for more detailed changes descriptions)

28a8ee18

Feb 10, 2019

Load different random subset of dataset on each epoch (#149) · 4b1d0c89

Guy Jacob authored 6 years ago

* For CIFAR-10 / ImageNet only
* Refactor data_loaders.py, reduce code duplication
* Implemented custom sampler
* Integrated in image classification sample
* Since we now shuffle the test set, had to update expected results
  in 2 full_flow_tests that do evaluation

4b1d0c89

Feb 06, 2019
- utils: add get_dummy_input() and simplify assign_layer_fq_names() · bb12616f
  Neta Zmora authored 6 years ago
  
  bb12616f
- Filter ranking: added support for L2-norm ranking with AGP pruning schedule · 05bf7551
  Neta Zmora authored 6 years ago
  
  05bf7551
- Filter ranking: add support for ranking by L2 magnitude · 2179ec50
  Neta Zmora authored 6 years ago
  
  2179ec50
- Fix for GradientRankedFilterPruner · 6d7288a8
  Neta Zmora authored 6 years ago
  
  A parameter was missing from one of the function calls.
  6d7288a8
- Fix spelling mistake in one of the comments · 5b38a261
  Neta Zmora authored 6 years ago
  
  5b38a261
- execution logging: recreate the cmd-line · b82fca42
  Neta Zmora authored 6 years ago
  
  Expand the command line arguments to recreate the original command line invocation.
  b82fca42
- SummaryGraph: always convert base the graph on a non-parallel model · 3ca2985d
  Neta Zmora authored 6 years ago
  
  The use of DataParallel is causing various small problems when used in conjunction with SummaryGraph. The best solution is to force SummaryGraph to use a non-data-parallel version of the model and to always normalize node names when accessing SummaryGraph operations.
  3ca2985d
Jan 31, 2019
- compress_classifier: add command-line option to "thinnify" a model · 2650f8f9
  Neta Zmora authored 6 years ago
  
  2650f8f9
- More robust handling of loading non-Distiller checkpoints · d81927d9
  Neta Zmora authored 6 years ago
  
  Specifically, gracefully handle a missing 'epoch' key in a loaded checkpoint file.
  d81927d9
Jan 27, 2019
- DoReFa 1-bit weights (#127) · eebe7a78
  JoyFreemanYan authored 6 years ago
  
  eebe7a78
Jan 24, 2019

Bugfix in test_quantizer · 8d694a03
Guy Jacob authored 6 years ago

8d694a03

Range linear Quant-aware training: Require single GPU, and more · 107e4825

Guy Jacob authored 6 years ago

* Current implementation doesn't play nice with DataParallel, so allow
  only with single GPU for now
* Bug-fix: Update ranges only when training (not in eval)
* Refactor activation replacement function - explicit module instead of
  sequential

107e4825

Jan 23, 2019
- Quant-aware training: Quantize bias to 32 bits (Hard-coded for now) · c98df541
  Guy Jacob authored 6 years ago
  
  c98df541
- Fix dtypes in test_post_train_quant.py · 0dac3e07
  Guy Jacob authored 6 years ago
  
  0dac3e07
Jan 22, 2019
- Fix pruner base class bug. (#131) · 994e58d0
  inner authored 6 years ago
  
  994e58d0
Jan 21, 2019
- Simplenet: Replace all functional operator calls with explicit layers · 24381169
  Guy Jacob authored 6 years ago
  
  24381169
- Updates to quantization parameters calculation: · 10ce938c
  Guy Jacob authored 6 years ago
  
  * Always include 0 in the range * Handle case where tensor is zeros only (fixes issue #115) * Add unit tests
  10ce938c
Jan 16, 2019

compress_classifier.py refactoring (#126) · cfbc3798

Bar authored 6 years ago

* Support for multi-phase activations logging

Enable logging activation both durning training and validation at
the same session.

* Refactoring: Move parser to its own file

* Parser is moved from compress_classifier into its own file.
* Torch version check is moved to precede main() call.
* Move main definition to the top of the file.
* Modify parser choices to case-insensitive

cfbc3798

Fix for CPU support · 4cc0e7d6
Neta Zmora authored 6 years ago

4cc0e7d6

Jan 15, 2019
- CPU support: fix thinning directive tensor migration to CPU/GPU · e564a05f
  Neta Zmora authored 6 years ago
  
  e564a05f
- Fix for CPU evaluation use-case · 81cb77d2
  Neta Zmora authored 6 years ago
  
  Fix a mismatch between the location of the model and the computation.
  81cb77d2
Jan 13, 2019
- CPU support: correct the device used for pruning masks · d1ef1930
  Neta Zmora authored 6 years ago
  
  When masks are loaded from a checkpoint file, they should use the same device as the model.
  d1ef1930
- compress_classifier.py: fix handling of --cpu application argument · 0edfb5a9
  Neta Zmora authored 6 years ago
  
  0edfb5a9
- Merge branch 'master' of https://github.com/NervanaSystems/distiller · 5222da58
  Neta Zmora authored 6 years ago
  
  5222da58
Jan 10, 2019

Enable compute (training/inference) on the CPU · 007b6903

Gal Novik authored 6 years ago

In compress_classifier.py we added a new application argument: --cpu
which you can use to force compute (training/inference) to run on the CPU 
when you invoke compress_classifier.py on a machine which has Nvidia GPUs.

If your machine lacks Nvidia GPUs, then the compute will now run on the CPU
(and you do not need the new flag).

Caveat: we did not fully test the CPU support for the code in the Jupyter 
notebooks.  If you find a bug, we apologize and appreciate your feedback.

007b6903

Jan 09, 2019
- ResNet50 filter pruning - add some older schedules · d765ffc9
  Neta Zmora authored 6 years ago
  
  These show a couple of the networks on the top1/sparsity curvewq
  d765ffc9
Jan 08, 2019

Non-channel/filter block pruning (#119) · b9d53ff8

Bar authored 6 years ago

Block pruning: support specifying the block shape from the YAML file

Block pruning refers to pruning 4-D structures of a specific shape.  This 
is a why it is sometimes called structure-pruning or group-pruning 
(confusing, I know).
A specific example of block pruning is filter or channel pruning, which
have a highly-regular block shape.   
This commit adds support for pruning blocks/groups/structures
that have irregular shapes that accelerate inference on a specific 
hardware platform.  You can read more about the regularity of shapes in
(Exploring the Regularity of Sparse Structure in
Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf].

When we want to introduce sparsity in order to reduce the compute load
of a certain layer, we need to understand how the HW and SW perform
the layer's operation, and how this operation is vectorized.  Then we can
induce sparsity to match the vector shape.

For example, Intel AVX-512 are SIMD instructions that apply the same
instruction (Single Instruction) on a vector of inputs (Multiple
Data).  The following single instruction performs an element-wise
multiplication of two 16 32-bit element vectors:

     __m256i result = __mm256_mul_epi32(vec_a, vec_b);

If either vec_a or vec_b are partially sparse, we still need to perform
the multiplication operation and the sparsity does not help reduce the
cost (power, latency) of computation.  However, if either vec_a or vec_b
contain only zeros then we can eliminate entirely the instruction.  In this 
case, we say that we would like to have group sparsity of 16-elements.  
I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements.

Things are a bit more involved because we also need to understand how the
software maps layer operations to hardware.  For example, a 3x3
convolution can be computed as a direct-convolution, as a matrix multiply
operation, or as a Winograd matrix operation (to name a few ways of
computation).  These low-level operations are then mapped to SIMD
instructions.

Finally, the low-level SW needs to support a block-sparse storage-format
for weight tensors (see for example:
http://www.netlib.org/linalg/html_templates/node90.html)

b9d53ff8

Dec 26, 2018
- Update README.md · 81492d79
  Gal Novik authored 6 years ago
  
  Minor command line fix in the post training example
  Unverified
  
  81492d79
Dec 23, 2018
- execution_env.py: fix code indentation · 6b7e52ae
  Neta Zmora authored 6 years ago
  
  6b7e52ae
Dec 19, 2018
- Bug fix: set the overall loss when not using a compression scheduler · f922973a
  Neta Zmora authored 6 years ago
  
  If compression_scheduler==None, then we need to set the value of losses[OVERALL_LOSS_KEY] (so it is the same as losses[OBJECTIVE_LOSS_KEY]). This was overlooked.
  f922973a
- Fix indentation · bc982f7e
  Neta Zmora authored 6 years ago
  
  bc982f7e
- Resnet50 AGP: fix numerical error in text describing the results. · 2a4cde0f
  Neta Zmora authored 6 years ago
  
  2a4cde0f