Commits · 4224dc2ee650af09cf58475a5c2ee5bcebc48850 · llvm / distiller

Nov 25, 2018
- Activation statistics: improve documentation · 4224dc2e
  Neta Zmora authored 6 years ago
  
  4224dc2e
Nov 24, 2018

Fix un-handled exception traces showing twice in stdout · 30812b87
Guy Jacob authored 6 years ago

30812b87

Fix activation stats for Linear layers · 22e3ea8b

Neta Zmora authored 6 years ago

Thanks to Dan Alistarh for bringing this issue to my attention.
The activations of Linear layers have shape (batch_size, output_size) and those
of Convolution layers have shape (batch_size, num_channels, width, height) and
this distinction in shape was not correctly handled.

This commit also fixes sparsity computation for very large activations, as seen
in VGG16, which leads to memory exhaustion.  One solution is to use smaller
batch sizes, but this commit uses a different solution, which counts zeros “manually”,
and using less space.

Also in this commit:
- Added a “caveats” section to the documentation.
- Added more tests.

22e3ea8b

Nov 22, 2018

Fix for issue #82 · fe9ffb17

Neta Zmora authored 6 years ago

The super() method of the wrong subclass was used.
In this case there were no practical implications, but we
need to move to the less error-prone syntax of Python 3.x
which does not require us to specify the subclass.

I change the super() invocations in the entire file and ran
two schedules for ResNet56 and actually got better results than
previously.  I don't think these results are related to this
change, and I cannot explain them.  Nontheless, I am committing
these new results, because I also fixed the command-line parameters
of resnet56_cifar_filter_rank_v2.yaml which had a copy & paste
error in it.

fe9ffb17

Fix Issue 79 (#81) · acbb4b4d

Neta Zmora authored 6 years ago

* Fix issue #79

Change the default values so that the following scheduler meta-data keys
are always defined: 'starting_epoch', 'ending_epoch', 'frequency'

* compress_classifier.py: add a new argument

Allow the specification, from the command line arguments,  of the range of
pruning levels scanned when doing sensitivity analysis

* Add regression test for issue #79

Unverified

acbb4b4d

Nov 21, 2018
- Activation statistics collection: add a patched version of ResNet · 485cc421
  Neta Zmora authored 6 years ago
  
  In our patched ResNet version, we change TorchVision's code so that ReLU module instances are used only once in a network.
  485cc421
- Activation statistics collection: add user assert · 794ac09d
  Neta Zmora authored 6 years ago
  
  When detecting a module that is used multiple times, stop execution and print an explanation to the user.v
  794ac09d
- Activation statistics collection: simplified the code · f3482e06
  Neta Zmora authored 6 years ago
  
  Trying to simplify the code.
  f3482e06
- Update documentation: expand on the use of activation stats collectors · b718c6df
  Neta Zmora authored 6 years ago
  
  b718c6df
- Documentation: add missing source file · 1d93f442
  Neta Zmora authored 6 years ago
  
  Add docs/conditional_computation.md which was accidentally left out of an earlier commit.
  1d93f442
Nov 20, 2018

Bug fix: value of best_top1 stored in the checkpoint may be wrong (#77) · 6242afed

Neta Zmora authored 6 years ago

* Bug fix: value of best_top1 stored in the checkpoint may be wrong

If you invoke compress_clasifier.py with --num-best-scores=n
with n>1, then the value of best_top1 stored in checkpoints is wrong.

Unverified

6242afed

Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51

Neta Zmora authored 6 years ago

When we resume from a checkpoint, we usually want to continue using the checkpoint’s
masks.  I say “usually” because I can see a situation where we want to prune a model
and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
(https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
```compress_classifier.py``` for the time being – so we ignore DSD.

There are two possible situations when we resume a checkpoint that has a serialized
```CompressionScheduler``` with pruning masks:
1. We are planning on using a new ```CompressionScheduler``` that is defined in a
schedule YAML file.  In this case, we want to copy the masks from the serialized
```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
constructing from the YAML file.  This is one fix.
2. We are resuming a checkpoint, but without using a YAML schedule file.
In this case we want to use the ```CompressionScheduler``` that we loaded from the
checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
as we train, so that we don’t lose them.  This is the second fix.

For DSD, we would need a new flag that would override using the ```CompressionScheduler```
that we load from the checkpoint.

Unverified

78e98a51

Nov 09, 2018

AGP filter-pruning: a new ResNet20 schedule · ff6985ad

Neta Zmora authored 6 years ago

Another schedule or ResNet20 Filter-wise pruning for ResNet20, with
64.6% sparsity, 25.4% compute reduction and Top1 91.47% (vs. 91.78
basline).

ff6985ad

Nov 08, 2018
- Dynamic Network Surgery: ResNet50 schedule · 961fcfdd
  Neta Zmora authored 6 years ago
  
  Top1 is 75.492 (on Epoch: 93) vs the published TorchVision baseline Top1: 76.15 (-0.66). Total sparsity: 80.05
  961fcfdd
- ResNet20 AGP-Filters: fix LR in the sample command line · 337e0270
  Neta Zmora authored 6 years ago
  
  Change the LR from 0.2 to 0.3, as was actually used to generate the results in remark.
  337e0270
- Early Exit docs (#75) · 470209b9
  Haim Barad authored 6 years ago
  
  * Updated stats computation - fixes issues with validation stats * Clarification of output (docs) * Update * Moved validation stats to separate function
  470209b9
- Bug fix: remove junk code accidentally committed to master · a2dce3a6
  Neta Zmora authored 6 years ago
  
  a2dce3a6
- compress_classifier: Bug fix in naming checkpoint created at quantized eval · 2fb2d515
  Guy Jacob authored 6 years ago
  
  2fb2d515
Nov 07, 2018
- Documentation: Early Exit documentation (2) · 7596a0a6
  Neta Zmora authored 6 years ago
  
  Add missing files from previous commit
  7596a0a6
- Documentation: add github pages documentation for Early Exit · 5681541f
  Neta Zmora authored 6 years ago
  
  5681541f
Nov 06, 2018

Bug fix: Thinning API was not updated after changing the scheduler callback signature · cb3049bd

Neta Zmora authored 6 years ago

We recently changed the signature of the on_minibatch_begin() callback
from the scheduler (added 'meta') and the callback client
in the thinning module was not updated.

cb3049bd

compress_classifier.py: add an option to load a model in serialized mode · 11402988

Neta Zmora authored 6 years ago

By default, when we create a model we  wrap it with DataParallel to benefit
from data-parallelism across GPUs (mainly for convolution layers).

But sometimes we don't want the sample application to do this: for
example when we receive a model that was trained serially.
This commit adds a new argument to the application to prevent
the use of DataParallel.

11402988

Fixed validation stats and added new summary stats (#71) · 3876a912

Haim Barad authored 6 years ago

* Fixed validation stats and added new summary stats

* Trimmed some comments.

* Improved figure for documentation

* Minor updates

3876a912

Nov 05, 2018

Dynamic Network Surgery (#69) · 60a4f44a

Neta Zmora authored 6 years ago

Added an implementation of:

Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen.
NIPS 2016, https://arxiv.org/abs/1608.04493.

- Added SplicingPruner: A pruner that both prunes and splices connections.
- Included an example schedule on ResNet20 CIFAR.
- New features for compress_classifier.py:
1. Added the "--masks-sparsity" which, when enabled, logs the sparsity
of the weight masks during training.
2. Added a new command-line argument to report the top N
best accuracy scores, instead of just the highest score.
This is sometimes useful when pruning a pre-trained model,
that has the best Top1 accuracy in the first few pruning epochs.
- New features for PruningPolicy:
1. The pruning policy can use two copies of the weights: one is used during
the forward-pass, the other during the backward pass.
This is controlled by the “mask_on_forward_only” argument.
2. If we enable “mask_on_forward_only”, we probably want to permanently apply
the mask at some point (usually once the pruning phase is done).
This is controlled by the “keep_mask” argument.
3. We introduce a first implementation of scheduling at the training-iteration
granularity (i.e. at the mini-batch granularity). Until now we could schedule
pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency”
(disable by setting to zero).

Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler.
Need to reexamine this in the future.

Unverified

60a4f44a

Nov 04, 2018
- Fix wrong link in the documentation (issue #68) · c247b57f
  Neta Zmora authored 6 years ago
  
  c247b57f
Nov 02, 2018

Update README · 1757f213

Neta Zmora authored 6 years ago

Changed the description of the feature-set.
Updating the README a little bit since a lot has changed since we released 6 months ago.  Still a lot to add/remove/change.

Unverified

1757f213

Nov 01, 2018
- Bug fix: fix crash when generating PNG of network · 735fdd0b
  Neta Zmora authored 6 years ago
  
  735fdd0b
- Averaging-based activations clipping in SymmetricLinearQuantizer (#56) · 68591870
  Guy Jacob authored 6 years ago
  
  * Added command line arguments for this and other post-training quantization settings in image classification sample.
  Unverified
  
  68591870
Oct 31, 2018
- ResNet20-Cifar: imporved the results of L1 filter pruning · 78cdad07
  Neta Zmora authored 6 years ago
  
  Small improvement in the results
  78cdad07
Oct 29, 2018

Update README.txt (fix issue #65) · 013b390b
Neta Zmora authored 6 years ago
```
fix issue #65: Missing link in the README file
```
Unverified

013b390b

Jupyter: utility notebook to save intermediate activations to file · b4a07e46

Neta Zmora authored 6 years ago

This short notebook performs a forward-pass on a single ImageNet image,
and saves the intermediate results to a file for later inspection

b4a07e46

Jupyter: Understanding inter-layer structure dependencies · eee01dbf

Neta Zmora authored 6 years ago

A notebook to visualize the data dependencies in non-trivial networks
such as ResNet.
This is meant to help in planning filter/channel pruning of such networks.

eee01dbf

Oct 27, 2018
- README: Added pointer to the the tutorial · 7b86f8e1
  Neta Zmora authored 6 years ago
  
  Unverified
  
  7b86f8e1
Oct 26, 2018
- model summary: Added tensor sizes annotation to generated model graph PNG · e7d897a6
  Neta Zmora authored 6 years ago
  
  e7d897a6
Oct 25, 2018

Pruning: remove unnecessary backward hook used for masking gradients (#63) · 9c701c1c

Neta Zmora authored 6 years ago

After commit f396c34a362731c765370d368877c2ca367ad651, we
Now always apply a pruning mask at the end of mini-batches (this is
because we understand that weights may be updated by SGD+momentum,
even for weights that are masked. Therefore, there is no need for masking
the gradients: we always mask the weights at the end of the mini-batch.

See issue #53 for more details.

Unverified

9c701c1c

Oct 23, 2018

Added a smaller version of generated CMAP PNG image for README · 8bf95d12
Neta Zmora authored 6 years ago

8bf95d12
Update README with info about two new notebooks · e05e5233
Neta Zmora authored 6 years ago

e05e5233

Jupyter: what are sparse models looking at? · dbeba8c8

Neta Zmora authored 6 years ago

This notebook provides visualizations of ResNet50 models pay attention
to when they classify. In other words, what are the models looking at.
This information can be used for localization, but here it is provided
merely to build our intuition.

Based on:
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning
Deep Features for Discriminative Localization. CVPR'16
(arXiv:1512.04150, 2015).

https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/

dbeba8c8

Jupyter: straight-forward application of Truncated SVD on ResNet50 · 91f845a4

Neta Zmora authored 6 years ago

This is a simple application of Truncated SVD, just to get a feeling of
what happens to the accuracy if we use TruncatedSVD w/o fine-tuning.

We apply Truncated SVD on the linear layer found at the end of ResNet50,
and run a test over the validation dataset to measure the impact on the
classification accuracy.

91f845a4

Fix for issue #60 · 95546cd1

Neta Zmora authored 6 years ago

The problem reported in issue #60 occurs when a user downloads an
archive of Distiller instead of performing "git clone".
When downloading an archive there is no ".git" directory and we
need to handle this gracefully.
For more details, see the issue itself.

95546cd1