Commits · ab35fed73738b497b6ac0c5a1f83a74fac29b54f · llvm / distiller

Jun 22, 2018

Thomas Fan authored 6 years ago

Reviewed and looking good.  We have to set a convention for naming files.

ab35fed7

Jun 21, 2018
- Clarifying comments in quantization YAMLs · a029c9b0
  Guy Jacob authored 6 years ago
  
  a029c9b0
- Add missing documentation and missing input check in MultiStepMultiGammaLR · fb44c8d0
  Guy Jacob authored 6 years ago
  
  fb44c8d0
- Minor additions to docs · 02f7871b
  Guy Jacob authored 6 years ago
  
  02f7871b
- Updated docs related to quantization · 3658374e
  Guy Jacob authored 6 years ago
  
  3658374e
- Update README.md · 2b653fdc
  Guy Jacob authored 6 years ago
  
  Unverified
  
  2b653fdc
- Training with quantization (#8) · 5bb9e138
  Guy Jacob authored 6 years ago
  
  Unverified
  
  5bb9e138
- Module name normalization: bug fixes, refactoring, tests · c59d2d51
  Neta Zmora authored 6 years ago
  
  Fixed a bug in module name normalization, for modules with a name ending in ".module" (e.g. "features.module" in the case of VGG). Made the tests more robust, and also refactored the common code to distiller/utils.py
  c59d2d51
Jun 19, 2018

Make PNG summary compatible with latest SummaryGraph class changes (#7) · 9e57219e

Guy Jacob authored 6 years ago

* Modify 'create_png' to use the correct data structures (dicts instead
  lists, etc.)
* Handle case where an op was called not from a module. This relates to:
  * ONNX->"User-Friendly" name conversion to account for cases where
  * Detection of existing op with same name
  In both cases use the ONNX op type in addition to the op name
* Return an "empty" shape instead of None when ONNX couldn't infer
  a parameter's shape
* Expose option of PNG summary with parameters to user

Unverified

9e57219e

Jun 15, 2018
- language model: add a bunch of schedule scripts · 1aaef790
  Neta Zmora authored 6 years ago
  
  1aaef790
Jun 14, 2018

Fixed a couple of bugs and added tests · 1d62f96f

Neta Zmora authored 6 years ago

When removing channels and thinning, the number of filters of the
next layer was not set correctly.

When loading a model that has already been thinned (e.g when loading
a model, thinning, saving, loading), don’t crash on wrong tensor sizes.

Cache the thinning recipe in the model when loading from checkpoint.
Without this, a loaded thin model will lose its recipes when saved to
checkpoint

1d62f96f

Documentation: small fix for RNN pruning image · 3910b5bd
Neta Zmora authored 6 years ago

3910b5bd
documentation: add PNG of main Baidu RNN pruning logic · e9e78530
Neta Zmora authored 6 years ago

e9e78530
language model: remove dead code · a83f4872
Neta Zmora authored 6 years ago

a83f4872
Documentation: added a bit of info regarding Baidu's RNN pruning algorithm · ce60fc14
Neta Zmora authored 6 years ago

ce60fc14

Revert "ModelSummary: adapt sparsity accounting to correctly account for "weight tying" · be97de23

Neta Zmora authored 6 years ago

This reverts commit ecade1b2.
This simply does not work, so reverting until we find a correct solution.
For example, in the language model the encoder and decoder weights are tied and use the
same memory, and yet I can't see how to determine that they are the same parameter.

be97de23

Jun 13, 2018

Testing: added a basic filter pruning + thinning test · 8de6223e
Neta Zmora authored 6 years ago

8de6223e
Language model: add a schedule for a Large model with 73% sparsity · 37aa68bf
Neta Zmora authored 6 years ago

37aa68bf

Language model: replace the optimizer and LR-decay scheduler · a9b28923

Neta Zmora authored 6 years ago

Replace the original "homebrew" optimizer and LR-decay schedule with
PyTorch's SGD and ReduceLROnPlateau.
SGD with momentum=0 and weight_decay=0, and ReduceLROnPlateau with
patience=0 and factor=0.5 will give the same behavior as in the
original PyTorch example.

Having a standard optimizer and LR-decay schedule gives us the
flexibility to experiment with these during the training process.

a9b28923

Baidu RNN pruner: add documentation and fix schedule · d6ffeaf7
Neta Zmora authored 6 years ago

d6ffeaf7

ModelSummary: adapt sparsity accounting to correctly account for "weight tying"wq · ecade1b2

Neta Zmora authored 6 years ago

In language models, we might use use "weight tying", which means that the same
weights tensor is used in several different places. If tying is used, we'd like
to log the tensor information, but exclude it from the total sparsity calculation.

ecade1b2

Jun 10, 2018

Thinning (#6) · 42650340

Neta Zmora authored 6 years ago

* Large update containing new thinning algorithm.

Thinning a model is the process of taking a dense network architecture with a parameter model that
has structure-sparsity (filters or channels) in the weights tensors of convolution layers, and making changes in the network architecture and parameters, in order to completely remove the structures.
The new architecture is smaller (condensed), with less channels and filters in some of the convolution layers. Linear and BatchNormalization layers are also adjusted as required.

To perform thinning, we create a SummaryGraph (‘sgraph’) of our model. We use the ‘sgraph’ to infer the
data-dependency between the modules in the PyTorch network. This entire process is not trivial and will be
documented in a different place.

Large refactoring of SummaryGraph to support the new thinning requirement for traversing successors and predecessors.
- Operations (ops) are now stored in a dictionary, so that they can be accessed quickly by name.
- Refactor Operation construction code
- Added support for search a node’s predecessors and successors. You can search for all predecessors/successors by depth, or by type.
- create_png now supports an option to display the parameter nodes

Updated schedules with new thinning syntax.

* Thinning: support iterative thinning of models

THere's a caveat with this commit: when using this code you will
need to train with SGD momentum=0.
The momentum update is dependent on the weights, and because we
dynamically change the weights shapes, we need to either make the
apporpriate changes in the Optimizer, or disable the momentum.
For now, we disable the momentum

* Thinning: move the application of FilterRemover to on_minibatch_begin

* Thinning: fix syntax error

* Word-level language model compression

Added an implementation of Baidu’s RNN pruning scheme:
Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017).
Exploring Sparsity in Recurrent Neural Networks.
(https://arxiv.org/abs/1704.05119)

Added an example of word-level language model compression.
The language model is based on PyTorch’s example:
https://github.com/pytorch/examples/tree/master/word_language_model

Added an AGP pruning schedule and RNN pruning schedule to demonstrate
compression of the language model.

* thinning: remove dead code

* remove resnet18 filter pruning since the scheduler script is incomplete

* thinning: fix indentation error

* thinning: remove dead code

* thinning: updated resnet20-CIFAR filter-removsal reference checkpoints

* thinning: updated resnet20-CIFAR filter-removal reference schedules

These are for use with the new thinning scheudle algorithm

Unverified

42650340

Jun 07, 2018

Word-level language model compression · 52658b87

Neta Zmora authored 6 years ago

Added an implementation of Baidu’s RNN pruning scheme:
Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017).
    Exploring Sparsity in Recurrent Neural Networks.
    (https://arxiv.org/abs/1704.05119)

Added an example of word-level language model compression.
The language model is based on PyTorch’s example:
https://github.com/pytorch/examples/tree/master/word_language_model

Added an AGP pruning schedule and RNN pruning schedule to demonstrate
compression of the language model.

52658b87

May 29, 2018

Filter removal: support filter removal for VGG · 47732093

Neta Zmora authored 6 years ago

This is a temporary implementation that allows filter-removal and
netowrk thinning for VGG.
The implementation continues the present design for network thinning,
which is problematic because parts of the solution are specific to
each model.

Leveraging some new features in PyTorch 0.4, we are now able to provide a
more generic solution to thinning, which we will push to 'master' soon.
This commit bridges the feature gap, for VGG filter-removal, for the
meantime.

47732093

Filter removal: support filter removal for VGG · 6f7c5ae4

Neta Zmora authored 6 years ago

This is a temporary implementation that allows filter-removal and
netowrk thinning for VGG.
The implementation continues the present design for network thinning,
which is problematic because parts of the solution are specific to
each model.

Leveraging some new features in PyTorch 0.4, we are now able to provide a
more generic solution to thinning, which we will push to 'master' soon.
This commit bridges the feature gap, for VGG filter-removal, for the
meantime.

6f7c5ae4

May 22, 2018
- docs-src/usage: Fix wrong schedule path (#4) (#5) · 0c2fcb55
  Neta Zmora authored 6 years ago
  
  Two places in the documentation gave the wrong path to the example Alexnet sensitivity pruning schedule.
  Unverified
  
  0c2fcb55
May 17, 2018

Fix system tests failure · a7ed8cad

Neta Zmora authored 6 years ago

The latest changes to the logger caused the CI tests to fail,
because test assumes that the logging.conf file is present in the
same directory as the sample application script.
The sample application used cwd() instead, and did not find the
log configuration file.

a7ed8cad

May 16, 2018

refactoring: move config_pylogger out of the sample app · 792e9e39
Neta Zmora authored 6 years ago
```
Soon we will be reusing this function in other sample apps, so let's
move it to app_utils.
```
792e9e39

Check if correct version of PyTorch is installed. · ba653d9a

Neta Zmora authored 6 years ago

The 'master' branch now uses PyTorch 0.4, which has API changes that
are not backward compatible with PyTorch 0.3.

After we've upgraded Distiller's internal implementation to be
compatible with PyTorch 0.4, we've added a check that you are using
the correct PyTorch version.

Note that we only perform this check in the sample image classifier
compression application.

ba653d9a

Increase Distiller version to 0.2.0-pre · bd946e68

Neta Zmora authored 6 years ago

Work on the 'master' branch uses pre-release version numbers.
After releasing v0.1.0 with PyTorch 0.3, we have upgraded 'master'
to support PyTorch 0.4 which contains API changes which are not
backward compatible.

bd946e68

Fix additional 0-dim accesses · 3fe741cc
Guy Jacob authored 6 years ago

3fe741cc

refactoring: move the message logger setup out of main() · 6e8b0fd6

Neta Zmora authored 6 years ago

Eventually we will want to use this code in other sample applications,
so let's move the logger configuration code to a separate function.

There's a bit of ugly hacking in this current implementation because
I've added variable members to logging.logger. These are actaully
config-once variables that convey the logging directory and filename.
I did not want to add more names to the global namespace, so I hacked
a temporary solution in which logging.logger is acts as a conveyor and
private namespace. We'll get that cleaned up as we do more refactoring.

6e8b0fd6

PyTorch 0.4 improvement to SummaryGraph · 32c01c28

Neta Zmora authored 6 years ago

PyTorch 0.4 now fully supports the ONNX export features that are needed
in order to create a SummaryGraph, which is sort of a "shadow graph" for
PyTorch models.

The big advantage of SummaryGraph is that it gives us information about
the connectivity of nodes.  With connectivity information we can compute
per-node MAC (compute) and BW, and better yet, we can remove channels,
filters, and layers (more on this in future commits).

In this commit we (1) replace the long and overly-verbose ONNX node names,
with PyTorch names; and (2) move MAC and BW attributes from the Jupyter
notebook to the SummaryGraph.

32c01c28

New summary option: print modules names · 6a940466

Neta Zmora authored 6 years ago

This is a niche feature, which lets you print the names of the modules
in a model, from the command-line.
Non-leaf nodes are excluded from this list.  Other caveats are documented
in the code.

6a940466

PNG summary: default to non-parallel graphs · 53b74ca6

Neta Zmora authored 6 years ago

Data parallel models may execute faster on multiple GPUs, but rendering
them creates visually complex and illegible graphs.
Therefore, when creating models for a PNG summary, we opt to use
non-parallel models.

53b74ca6

Change the way a module callback resolves the module name · 6cd78f46

Neta Zmora authored 6 years ago

When we are traversing the forward path of a graph, by invoking each
module's forward_hook callback, we sometimes want to know the full
name of the module.
Previously, to infer the module name, we looked up the name of self.weight
parameter and used that to get the module name.
In PyTorch 0.4 we can directly look up the module name using
model_find_module_name.

6cd78f46

pytorch 0.4: adjustments to API changes · 957e6777

Neta Zmora authored 6 years ago

Various small changes due to the chamnges in the semantics and syntax of the
PyTorch 0.4 API.

Note that currently distiller.model_performance_summary() returns wrong results
on graphs containing torch.nn.DataParallel layers.

957e6777

fix broken syntax in requirements.txt · ea580770
Neta Zmora authored 6 years ago

ea580770
update requirements.txt to pytorch 0.4 · f4d8e70a
Neta Zmora authored 6 years ago

f4d8e70a

pytorch 0.4: The type() of a Tensor has changed · 8ab0c3e9

Neta Zmora authored 6 years ago

Following https://pytorch.org/2018/04/22/0_4_0-migration-guide.html, we
need to be more precise in how we use .type()

8ab0c3e9