Commits · master · llvm / distiller · GitLab

Snippets Groups Projects

Oct 27, 2020
- Update links everywhere following repo organization change (#542) · f401db94
  Guy Jacob authored 4 years ago
  
  Unverified
  
  f401db94
Jan 15, 2020

Fix scale factor calculation in symmetric quantization (#463) · 78255ee0

Guy Jacob authored 5 years ago

(we use 8-bit values below, but this applies to any bit-width)
* We use the notion of "full" and "restricted" quantized range for
  symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
* "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
* Until now, when doing symmetric quantization we assumed a "full"
  range when saturating after quantization, but calculated the scale
  factor as if the range was restricted. This means we weren't making
  full utilization of the quantized range.
* On the other hand, in some other implementations of quantization (e.g.
  TensorFlow), the "restricted" range is used.
* So, we make it an option to use either the proper "full" range
  (q_min = -128) or "restricted" range (q_min = -127).
* LinearQuantMode.SYMMETRIC now means the "full" range is used, and
  added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
  range.
* Updated tests and documentation.

78255ee0

Aug 08, 2019
- Point to GNMT example in docs · 7b5fdefe
  Guy Jacob authored 5 years ago
  
  7b5fdefe
Aug 04, 2019
- Add documentation section on preparing a model for quantization · 7fee2c9d
  Guy Jacob authored 5 years ago
  
  7fee2c9d
Jul 10, 2019

Post-Train Quantization: BN folding and "net-aware quantization" (#313) · 43548deb

Guy Jacob authored 5 years ago

* "Net-aware quantization" - using the term coined in
  https://arxiv.org/abs/1811.09886. (section 3.2.2).
  Refers to considering sequences of modules when quantizing. This 
  isn't exactly layer fusion - we modify activation stats prior to
  setting quantization parameters, to make sure that when a module
  is followed by certain activation functions, only the relevant
  ranges are quantized. We do this for:
    * ReLU - Clip all negative values
    * Tanh / Sigmoid - Clip according to the (approximated) saturation
      values for these functions. We use [-4, 4] for tanh and [-6, 6]
      for sigmoid.

* Perform batch-norm folding before post-training quantization.
  Batch-norm parameters are folded into the parameters of the previous
  layer and the BN layer is replaced with an identity module.

* Both BN folding and "net-aware" are now automatically executed
  in PostTrainLinearQuantizer (details of this change below)

* BN folding enabled by new generic mechanism to "fuse" module
  sequences (at the Python API level)
    * First module in sequence is replaced/modified by a user-provided
      function, rest of moudles replaced with nn.Identity

* Quantizer changes:
  * Optionally create adjacency map during prepare_model
  * Subclasses may enforce adjacency map creation
  * Refatcoring: Replace _prepare_model_impl with pre and post
    override-able "callbacks", so core functionality is always executed

* PostTrainLinearQuantizer Changes:
  * Enforce creation of adjacency map. This means users must now pass a
    dummy input to PostTrainLinearQuantizer.prepare_model
  * Before module replacement - Apply BN folding and stats updates according
    to net-aware quantization

* Updated the language model quantization tutorial to reflect the new
  functionality

* Updated the image classification post-train quantization samples
  (command line and YAML)

* Other changes:
  * Distller LSTM implementation:
    Replace the ModuleList for cells with a plain list. The PyTorch trace
    mechanism doesn't "see" ModuleList objects, it only sees the 
    contained modules. This means that the "scopeName" of these modules
    isn't complete, which makes it impossible to match op names in 
    SummaryGraph to modules in the Python model.
  * ActivationStatsCollector: Ignore nn.Identity modules

43548deb

Jul 08, 2019
- Add links to language model quantization notebook in README and docs · 81047f5d
  Guy Jacob authored 5 years ago
  
  81047f5d
Jun 10, 2019

Fix a bunch of broken links · 4103bb44

Neta Zmora authored 5 years ago

Some links have changed with the latest version of mkdocs.
This closes issues #280 and #65 (reopened).

4103bb44

May 19, 2019

Update PostTrainQuantizer initialization sample in docs (add htmls) · 5814bcd5
Guy Jacob authored 5 years ago

5814bcd5

Post-training quantization: Scale factor approximation (#261) · 66c0ad1d

Guy Jacob authored 5 years ago

* Added scale factor approximation in post-training quantization using
  integer multiply + shift. # of bits for integer multiplier is user
  configurable
* Updated documentation
* Updated post-train quant command line examples readme file

66c0ad1d

Apr 14, 2019

Post-train quant: Extend acts clipping functionality (#225) · 437e270b

Guy Jacob authored 5 years ago

* Some refactoring to enable multiple clipping methods
* BREAKING: clip_acts as a boolean flag (either in command line
  or in function signature) will fail. Error message with valid
  values from is displayed.
* Implemented clipping activations at mean + N * std
  (N is user configurable)
* Additional tests
* Updated docs

437e270b

Docs: Fix broken images and links · b9207bf7
Guy Jacob authored 5 years ago

b9207bf7

Apr 08, 2019

Documentation: add missing images · ce082d5e

Neta Zmora authored 5 years ago

Unnfortunately, we maintain 2 copies of documentation images (one
for the documentation source; another for the generated documentation).
We need to solve this as it makes the repository size unproportionally
large.

ce082d5e

Refine pruning logic (#222) · 816a943d

Neta Zmora authored 5 years ago

Add finer control over the pruning logic, to accommodate more pruning
use-cases.
The full description of the new logic is available in the updated [documentation
of the CompressionScheduler](https://nervanasystems.github.io/distiller/schedule.html#pruning-fine-control), which is also part of this PR.

In this PR:

* Added a new callback to the CompressionScheduler:
compression_scheduler.before_parameter_optimization which is invoked
after the gradients are are computed, but before the weights are updated
by the optimizer.

* We provide an option to mask the gradients, before the weights are updated by the optimizer. 
We register to the parameter backward hook in order to mask the gradients.
This gives us finer control over the parameter updates.

* Added several DropFilter schedules.
DropFilter is a method to regularize networks, and it can also be
used to "prepare" a network for permanent filter pruning.

*Added documentation of pruning fine-control

816a943d

Apr 01, 2019

Quantizer: Specify # bias bits + custom overrides (BREAKING) (#178) · 5271625a

Lev Zlotnik authored 5 years ago

* Bias handling:
  * Add 'bits_bias' parameter to explicitly specify # of bits for bias,
    similar to weights and activations.
  * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter
* Custom overrides:
  * Expand the semantics of the overrides dict to allow overriding of
    other parameters in addition to bit-widths
  * Functions registered in the quantizer's 'replacement_factory' can
    define keyword arguments. Non bit-width entries in the overrides
    dict will be checked against the function signature and passed
  * BREAKING:
    * Changed the name of 'bits_overrides' to simply 'overrides'
    * Bit-width overrides must now be defined using the full parameter
      names - 'bits_activations/weights/bias' instead of the short-hands
      'acts' and 'wts' which were used so far.
  * Added/updated relevant tests
  * Modified all quantization YAMLs under 'examples' to reflect 
    these changes
  * Updated docs

5271625a

Mar 29, 2019
- Fixed a typo in te quantization documentation (#207) · f5987f9a
  Songyi Blair Han authored 5 years ago
  
  f5987f9a
Feb 26, 2019

PyTorch 1.0.0 support + Proper Packaging (Release 0.3) (#144) · 62862a08

Lev Zlotnik authored 6 years ago

Not backward compatible - re-installation is required

* Fixes for PyTorch==1.0.0
* Refactoring folder structure
* Update installation section in docs

62862a08

Feb 11, 2019

Post-train quant based on stats + additional modules quantized (#136) · 28a8ee18

Guy Jacob authored 6 years ago

Summary of changes:
(1) Post-train quantization based on pre-collected statistics
(2) Quantized concat, element-wise addition / multiplication and embeddings
(3) Move post-train quantization command line args out of sample code
(4) Configure post-train quantization from YAML for more fine-grained control

(See PR #136 for more detailed changes descriptions)

28a8ee18

Dec 11, 2018
- Updated early-exit docs (from @haim-barad) · 8bcaaa53
  Guy Jacob authored 6 years ago
  
  8bcaaa53
Dec 06, 2018

Documentation refactoring · 43fccc82
Neta Zmora authored 6 years ago
```
Add missing image files [arghhhh.]
```
43fccc82
Documentation refactoring · e232fd2d
Neta Zmora authored 6 years ago
```
Add missing files :-(
```
e232fd2d

Documentation refactoring · 178c8c49

Neta Zmora authored 6 years ago

- Moved the Language model and struct pruning tutorials from the Wiki to
the HTML documentation.  Love the ease of Wiki, but GitHub doesn't let
Google crawl these pages, and users can't open PRs on Wiki pages.

- Updated the pruning algorithms documentation

178c8c49

Dec 04, 2018

Range-Based Linear Quantization Features (#95) · 907a6f04

Guy Jacob authored 6 years ago

* Asymmetric post-training quantization (only symmetric supported so until now)
* Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
* Per-channel quantization support in both training and post-training
* Added tests and examples
* Updated documentation

907a6f04

Nov 25, 2018
- Activation statistics: improve documentation · 4224dc2e
  Neta Zmora authored 6 years ago
  
  4224dc2e
Nov 24, 2018

Fix activation stats for Linear layers · 22e3ea8b

Neta Zmora authored 6 years ago

Thanks to Dan Alistarh for bringing this issue to my attention.
The activations of Linear layers have shape (batch_size, output_size) and those
of Convolution layers have shape (batch_size, num_channels, width, height) and
this distinction in shape was not correctly handled.

This commit also fixes sparsity computation for very large activations, as seen
in VGG16, which leads to memory exhaustion.  One solution is to use smaller
batch sizes, but this commit uses a different solution, which counts zeros “manually”,
and using less space.

Also in this commit:
- Added a “caveats” section to the documentation.
- Added more tests.

22e3ea8b

Nov 21, 2018
- Update documentation: expand on the use of activation stats collectors · b718c6df
  Neta Zmora authored 6 years ago
  
  b718c6df
Nov 07, 2018
- Documentation: Early Exit documentation (2) · 7596a0a6
  Neta Zmora authored 6 years ago
  
  Add missing files from previous commit
  7596a0a6
- Documentation: add github pages documentation for Early Exit · 5681541f
  Neta Zmora authored 6 years ago
  
  5681541f
Nov 04, 2018
- Fix wrong link in the documentation (issue #68) · c247b57f
  Neta Zmora authored 6 years ago
  
  c247b57f
Oct 03, 2018

documentation: update syntax of launching jupyter notebook · 5902146a

Neta Zmora authored 6 years ago

Latest versions of Jupyter notebooks have a different syntax for
launching the server such that it listens on oll network interfaces
(this is useful if you are running the Jupyter server on one machine,
and connect to it from a browser on a different machine).

So:
	jupyter-notebook --ip=* --no-browser

is replaced by:
	jupyter-notebook --ip=0.0.0.0 --no-browser

5902146a

Sep 16, 2018

A temporary fix for issue #36 (#48) · 5d3d6d8d

Neta Zmora authored 6 years ago

* A temporary fix for issue 36

The thinning code assumes that the sgraph it is using
is not data-parallel, because it (currently) accesses the
layer-name keys using a "normalized" name ("module." is removed).

The bug is that in thinning.py#L73 we create a data_parallel=True
model; and then give it to sgraph.
But in other places thinning code uses "normalized" keys.  For
example in thinning.py#L264.

The temporary fix configures data_parallel=False in thinning.py#L73.

A long term solution should have SummaryGraph know how to handle
both parallel and not-parallel models.  This can be done by having
SummaryGraph convert layer-names it receives in the API to
data_parallel=False using normalize_layer_name.  When returning
results, use the de-normalized format.

* Fix the documentation error from issue 36
* Move some logs to debug and show in logging.conf how to enable DEBUG logs.

5d3d6d8d

Sep 03, 2018

Add knowledge distillation flow (#41) · c9794e4a

Guy Jacob authored 6 years ago

* Implemented as a Policy
* Integrated in image classification sample
* Updated docs and README

c9794e4a

Jul 22, 2018

PACT quantizer (#30) · df9a00ce

Gal Novik authored 6 years ago

* Adding PACT quantization method
* Move logic modifying the optimizer due to changes the quantizer makes into the Quantizer itself
* Updated documentation and tests

df9a00ce

Jul 17, 2018

Quantizer tests, fixes and docs update · 6b166cec

Guy Jacob authored 6 years ago

* Add Quantizer unit tests
* Require 'bits_overrides' to be OrderedDict to support overlapping
  patterns in a predictable manner + update documentation to reflect this
* Quantizer class cleanup
  * Use "public" nn.Module APIs instead of protected attributes
  * Call the builtins set/get/delattr instead of the class special methods
    (__***__)
  * Fix issues reported in #24
* Bug in RangeLinearQuantParamLayerWrapper - add explicit override of
  pre_quantized_forward accpeting single input (#15)
* Add DoReFa test to full_flow_tests

6b166cec

Jul 01, 2018
- Fix symmetric linear quantization math derivation in docs (#12) · 19d33c50
  Guy Jacob authored 6 years ago
  
  * Scale of bias and parentheses were wrong
  Unverified
  
  19d33c50
Jun 21, 2018
- Minor additions to docs · 02f7871b
  Guy Jacob authored 6 years ago
  
  02f7871b
- Updated docs related to quantization · 3658374e
  Guy Jacob authored 6 years ago
  
  3658374e
Jun 14, 2018
- Documentation: small fix for RNN pruning image · 3910b5bd
  Neta Zmora authored 6 years ago
  
  3910b5bd
- Documentation: added a bit of info regarding Baidu's RNN pruning algorithm · ce60fc14
  Neta Zmora authored 6 years ago
  
  ce60fc14
May 22, 2018
- docs-src/usage: Fix wrong schedule path (#4) (#5) · 0c2fcb55
  Neta Zmora authored 6 years ago
  
  Two places in the documentation gave the wrong path to the example Alexnet sensitivity pruning schedule.
  Unverified
  
  0c2fcb55
May 14, 2018
- 8-bit Quantization - Save model + add test + updated docs (#3) · 443e7381
  Guy Jacob authored 6 years ago
  
  443e7381