Commits · 178c8c4908d9f307d536ad53172179222ae163f4 · llvm / distiller · GitLab

Snippets Groups Projects

Dec 06, 2018

Documentation refactoring · 178c8c49

Neta Zmora authored 6 years ago

- Moved the Language model and struct pruning tutorials from the Wiki to
the HTML documentation.  Love the ease of Wiki, but GitHub doesn't let
Google crawl these pages, and users can't open PRs on Wiki pages.

- Updated the pruning algorithms documentation

178c8c49

Dec 04, 2018
- Merge branch 'master' of https://github.com/NervanaSystems/distiller · ec9a3bf1
  Neta Zmora authored 6 years ago
  
  ec9a3bf1
- AMC: adjust to latest channel/filter pruning APIs · 7cecd48d
  Neta Zmora authored 6 years ago
  
  7cecd48d
- Range-Based Linear Quantization Features (#95) · 907a6f04
  Guy Jacob authored 6 years ago
  
  * Asymmetric post-training quantization (only symmetric supported so until now) * Quantization aware training for range-based (min-max) symmetric and asymmetric quantization * Per-channel quantization support in both training and post-training * Added tests and examples * Updated documentation
  Unverified
  
  907a6f04
- Change the specification of Softmax ONNX export · 6fd72783
  Neta Zmora authored 6 years ago
  
  As @cattpku notes in issue #89, we should follow the [PyTorch ONNX limitations](https://pytorch.org/docs/stable/onnx.html#supported-operators) specification. For image classifiers this change is only syntactical.
  6fd72783
- Fix the apputils.model_summaries logger configuration · 49fe4a52
  Neta Zmora authored 6 years ago
  
  49fe4a52
Dec 03, 2018
- ResNet50 Dynamic Surgery: new schedule with improved results · 37d5774b
  Neta Zmora authored 6 years ago
  
  Top1: 75.52% (-0.63% from TorchVision dense ResNet50) Total sparsity: 82.6%
  37d5774b
Dec 02, 2018

Update README.md with November features · a27aabe3
Neta Zmora authored 6 years ago

Unverified

a27aabe3
Added image for use in README Nov 2018 update · 2b2cf6e1
Neta Zmora authored 6 years ago

2b2cf6e1

Address the concern raised in issue #94 · 74b49c8d

Neta Zmora authored 6 years ago

In a previous commit, we chose to override the TorchVision ResNet
models with our own modified versions of ResNet because collecting
activation statistics after ReLU layers requires that each ReLU
instance is only used once in the graph.

However, overriding TorchVision models means that if someone wants
to directly access the TorchVision model dictionary, he/she won't
find the ResNet models in the dictionary keys.  This is not very
friendly and we should not break imported modules.

This commit overrides the TorchVision ResNet models when accessed
via the distiller.models.create_model API, w/o changing the
TorchVision dictionary.

74b49c8d

A couple of clarifications and typo fixes from last commit · 44144f7c
Guy Jacob authored 6 years ago

44144f7c
Add knowledge distillation examples + some results · 6d455597
Guy Jacob authored 6 years ago

6d455597

Wiki: Fix example images · 75a1e7e8

Neta Zmora authored 6 years ago

Replace "concat" layers with "element-wise addition" layers,
to match the text.

75a1e7e8

Dec 01, 2018

Important changes to pruning channels and filters (#93) · a0bf2a8f

Neta Zmora authored 6 years ago

This commit contains the main fix for issue #85.  It contains a couple of changes to the YAML structure pruning API, with examples.
I urge you to read the documentation in the Wiki (https://github.com/NervanaSystems/distiller/wiki/Pruning-Filters-&-Channels).

New syntax for defining Structured AGP.  I tried to make the syntax similar to fine-grained
(i.e. element-wise) pruning.  All you need to do is add: ```group_type: Filters```.
```
  low_pruner:
    class: L1RankedStructureParameterPruner_AGP
    initial_sparsity : 0.10
    final_sparsity: 0.50
    group_type: Filters
    weights: [module.layer3.0.conv2.weight,
              module.layer3.0.downsample.0.weight,
              module.layer3.1.conv2.weight,
              module.layer3.2.conv2.weight]
```

If you want to define “leader-based” pruning dependencies, add ```group_dependency: Leader```:
```
  low_pruner:
    class: L1RankedStructureParameterPruner_AGP
    initial_sparsity : 0.10
    final_sparsity: 0.50
    group_type: Filters
    group_dependency: Leader
    weights: [module.layer3.0.conv2.weight,
              module.layer3.0.downsample.0.weight,
              module.layer3.1.conv2.weight,
              module.layer3.2.conv2.weight]
```

Retired the old ```reg_regims``` API for describing one-shot structured-pruning.

The new YAML API is very similar to AGP structured-pruning, which is much better
than before.
The new API also allows us to describe data-dependencies when doing one-shot
structure pruning, just like AGP structured-pruning.

This commit also includes further code refactoring.

Old API:
```
  filter_pruner:
     class: 'L1RankedStructureParameterPruner'
     reg_regims:
       'module.layer1.0.conv1.weight': [0.6, '3D']
       'module.layer1.1.conv1.weight': [0.6, '3D']
```

New API:
```
 filter_pruner:
    class: 'L1RankedStructureParameterPruner'
    group_type: Filters
    desired_sparsity: 0.6
    weights: [
      module.layer1.0.conv1.weight,
      module.layer1.1.conv1.weight]
```

thresholding.py – separate the generation of the binary_map from the pruning_mask so that we
can cache the binary map and share it between several modules.

pruning/automated_gradual_pruner.py – major refactoring to supported “leader-based”
sub-graph pruning dependencies.  The concept is explained in issue #85


agp-pruning/resnet20_filters.schedule_agp.yaml
agp-pruning/resnet20_filters.schedule_agp_2.yaml
agp-pruning/resnet20_filters.schedule_agp_3.yaml
network_trimming/resnet56_cifar_activation_apoz.yaml
network_trimming/resnet56_cifar_activation_apoz_v2.yaml

a0bf2a8f

Nov 29, 2018

Fix assign_layer_fq_names (#88) · a2f57e6b

Bar authored 6 years ago

Add support to models that contain named empty layers (i.e. have type: NoneType).
This fix catches the AttributeError that is raised when a named NoneType layer is detected, and the layer is ignored.

a2f57e6b

Nov 25, 2018

Fix spelling error ("sprasity" instead of "sparsity") · ea173cb9
Neta Zmora authored 6 years ago
```
Thanks to @haim-barad for notifying us about this.
```
ea173cb9

Activation statistics: change the APoZ sorting order · d1cd450e

Neta Zmora authored 6 years ago

When ranking filters by APoZ, we need to change the sort order, to match
that of module.apoz_channels.value().  This is the result of the previous
bug fix.

d1cd450e

Activation statistics: fix computation of Channel-wise APoZ · 6fb0fed5

Neta Zmora authored 6 years ago

Instead of returning the average-percentage-of-zeros, we returned the
average-percentage-of-non-zeros.
So inverted the results, and also multiplied by 100, because the name
of the statistic has "percentage" not "fraction" (not very important,
but still...)

6fb0fed5

Activation statistics: improve documentation · 4224dc2e
Neta Zmora authored 6 years ago

4224dc2e

Nov 24, 2018

Fix un-handled exception traces showing twice in stdout · 30812b87
Guy Jacob authored 6 years ago

30812b87

Fix activation stats for Linear layers · 22e3ea8b

Neta Zmora authored 6 years ago

Thanks to Dan Alistarh for bringing this issue to my attention.
The activations of Linear layers have shape (batch_size, output_size) and those
of Convolution layers have shape (batch_size, num_channels, width, height) and
this distinction in shape was not correctly handled.

This commit also fixes sparsity computation for very large activations, as seen
in VGG16, which leads to memory exhaustion.  One solution is to use smaller
batch sizes, but this commit uses a different solution, which counts zeros “manually”,
and using less space.

Also in this commit:
- Added a “caveats” section to the documentation.
- Added more tests.

22e3ea8b

Nov 22, 2018

Fix for issue #82 · fe9ffb17

Neta Zmora authored 6 years ago

The super() method of the wrong subclass was used.
In this case there were no practical implications, but we
need to move to the less error-prone syntax of Python 3.x
which does not require us to specify the subclass.

I change the super() invocations in the entire file and ran
two schedules for ResNet56 and actually got better results than
previously.  I don't think these results are related to this
change, and I cannot explain them.  Nontheless, I am committing
these new results, because I also fixed the command-line parameters
of resnet56_cifar_filter_rank_v2.yaml which had a copy & paste
error in it.

fe9ffb17

Fix Issue 79 (#81) · acbb4b4d

Neta Zmora authored 6 years ago

* Fix issue #79

Change the default values so that the following scheduler meta-data keys
are always defined: 'starting_epoch', 'ending_epoch', 'frequency'

* compress_classifier.py: add a new argument

Allow the specification, from the command line arguments,  of the range of
pruning levels scanned when doing sensitivity analysis

* Add regression test for issue #79

acbb4b4d

Nov 21, 2018
- Activation statistics collection: add a patched version of ResNet · 485cc421
  Neta Zmora authored 6 years ago
  
  In our patched ResNet version, we change TorchVision's code so that ReLU module instances are used only once in a network.
  485cc421
- Activation statistics collection: add user assert · 794ac09d
  Neta Zmora authored 6 years ago
  
  When detecting a module that is used multiple times, stop execution and print an explanation to the user.v
  794ac09d
- Activation statistics collection: simplified the code · f3482e06
  Neta Zmora authored 6 years ago
  
  Trying to simplify the code.
  f3482e06
- Update documentation: expand on the use of activation stats collectors · b718c6df
  Neta Zmora authored 6 years ago
  
  b718c6df
- Documentation: add missing source file · 1d93f442
  Neta Zmora authored 6 years ago
  
  Add docs/conditional_computation.md which was accidentally left out of an earlier commit.
  1d93f442
Nov 20, 2018

Bug fix: value of best_top1 stored in the checkpoint may be wrong (#77) · 6242afed

Neta Zmora authored 6 years ago

* Bug fix: value of best_top1 stored in the checkpoint may be wrong

If you invoke compress_clasifier.py with --num-best-scores=n
with n>1, then the value of best_top1 stored in checkpoints is wrong.

6242afed

Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51

Neta Zmora authored 6 years ago

When we resume from a checkpoint, we usually want to continue using the checkpoint’s
masks.  I say “usually” because I can see a situation where we want to prune a model
and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
(https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
```compress_classifier.py``` for the time being – so we ignore DSD.

There are two possible situations when we resume a checkpoint that has a serialized
```CompressionScheduler``` with pruning masks:
1. We are planning on using a new ```CompressionScheduler``` that is defined in a
schedule YAML file.  In this case, we want to copy the masks from the serialized
```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
constructing from the YAML file.  This is one fix.
2. We are resuming a checkpoint, but without using a YAML schedule file.
In this case we want to use the ```CompressionScheduler``` that we loaded from the
checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
as we train, so that we don’t lose them.  This is the second fix.

For DSD, we would need a new flag that would override using the ```CompressionScheduler```
that we load from the checkpoint.

78e98a51

Nov 09, 2018

AGP filter-pruning: a new ResNet20 schedule · ff6985ad

Neta Zmora authored 6 years ago

Another schedule or ResNet20 Filter-wise pruning for ResNet20, with
64.6% sparsity, 25.4% compute reduction and Top1 91.47% (vs. 91.78
basline).

ff6985ad

Nov 08, 2018
- Dynamic Network Surgery: ResNet50 schedule · 961fcfdd
  Neta Zmora authored 6 years ago
  
  Top1 is 75.492 (on Epoch: 93) vs the published TorchVision baseline Top1: 76.15 (-0.66). Total sparsity: 80.05
  961fcfdd
- ResNet20 AGP-Filters: fix LR in the sample command line · 337e0270
  Neta Zmora authored 6 years ago
  
  Change the LR from 0.2 to 0.3, as was actually used to generate the results in remark.
  337e0270
- Early Exit docs (#75) · 470209b9
  Haim Barad authored 6 years ago
  
  * Updated stats computation - fixes issues with validation stats * Clarification of output (docs) * Update * Moved validation stats to separate function
  470209b9
- Bug fix: remove junk code accidentally committed to master · a2dce3a6
  Neta Zmora authored 6 years ago
  
  a2dce3a6
- compress_classifier: Bug fix in naming checkpoint created at quantized eval · 2fb2d515
  Guy Jacob authored 6 years ago
  
  2fb2d515
Nov 07, 2018
- Documentation: Early Exit documentation (2) · 7596a0a6
  Neta Zmora authored 6 years ago
  
  Add missing files from previous commit
  7596a0a6
- Documentation: add github pages documentation for Early Exit · 5681541f
  Neta Zmora authored 6 years ago
  
  5681541f
Nov 06, 2018

Bug fix: Thinning API was not updated after changing the scheduler callback signature · cb3049bd

Neta Zmora authored 6 years ago

We recently changed the signature of the on_minibatch_begin() callback
from the scheduler (added 'meta') and the callback client
in the thinning module was not updated.

cb3049bd

compress_classifier.py: add an option to load a model in serialized mode · 11402988

Neta Zmora authored 6 years ago

By default, when we create a model we  wrap it with DataParallel to benefit
from data-parallelism across GPUs (mainly for convolution layers).

But sometimes we don't want the sample application to do this: for
example when we receive a model that was trained serially.
This commit adds a new argument to the application to prevent
the use of DataParallel.

11402988