Skip to content
Snippets Groups Projects
Unverified Commit b8f34117 authored by Guy Jacob's avatar Guy Jacob Committed by GitHub
Browse files

Update Examples Documentation (#441)

* Make it easier to find sample apps for different workload types
* Add READMEs for sample apps the didn't have any
* Update readmes with experiment results where applicable
parent 830aa356
No related branches found
No related tags found
No related merge requests found
...@@ -52,19 +52,21 @@ If updating from an earlier revision of the code, please make sure to follow the ...@@ -52,19 +52,21 @@ If updating from an earlier revision of the code, please make sure to follow the
- [Using venv](#using-venv) - [Using venv](#using-venv)
- [Activate the environment](#activate-the-environment) - [Activate the environment](#activate-the-environment)
- [Install the package](#install-the-package) - [Install the package](#install-the-package)
- [Required PyTorch Version](#required-pytorch-version)
- [Getting Started](#getting-started) - [Getting Started](#getting-started)
- [Example invocations of the sample application](#example-invocations-of-the-sample-application) - [Basic Usage Examples](#basic-usage-examples)
- [Training-only](#training-only) - [Training-only](#training-only)
- [Getting parameter statistics of a sparsified model](#getting-parameter-statistics-of-a-sparsified-model) - [Getting parameter statistics of a sparsified model](#getting-parameter-statistics-of-a-sparsified-model)
- [Post-training quantization](#post-training-quantization) - [Post-training quantization](#post-training-quantization)
- [Explore the sample Jupyter notebooks](#explore-the-sample-jupyter-notebooks) - [Explore the sample Jupyter notebooks](#explore-the-sample-jupyter-notebooks)
- [Set up the classification datasets](#set-up-the-classification-datasets)
- [Running the tests](#running-the-tests) - [Running the tests](#running-the-tests)
- [Generating the HTML documentation site](#generating-the-html-documentation-site) - [Generating the HTML documentation site](#generating-the-html-documentation-site)
- [Built With](#built-with) - [Built With](#built-with)
- [Versioning](#versioning) - [Versioning](#versioning)
- [License](#license) - [License](#license)
- [Community](#community) - [Community](#community)
- [Github projects using Distiller:](#github-projects-using-distiller)
- [Research papers citing Distiller:](#research-papers-citing-distiller)
- [Acknowledgments](#acknowledgments) - [Acknowledgments](#acknowledgments)
- [Disclaimer](#disclaimer) - [Disclaimer](#disclaimer)
...@@ -179,25 +181,30 @@ If you do not use CUDA 10.1 in your environment, please refer to [PyTorch websit ...@@ -179,25 +181,30 @@ If you do not use CUDA 10.1 in your environment, please refer to [PyTorch websit
## Getting Started ## Getting Started
You can jump head-first into some limited examples of network compression, to get a feeling for the library without too much investment on your part. Distiller comes with sample applications and tutorials covering a range of model types:
Distiller comes with a sample application for compressing image classification DNNs, ```compress_classifier.py``` located at ```distiller/examples/classifier_compression```. | Model Type | Sparsity | Post-train quant | Quant-aware training | Auto Compression (AMC) |
|------------|:--------:|:----------------:|:--------------------:|:----------------------:|
| [Image classification](https://github.com/NervanaSystems/distiller/tree/master/examples/classifier_compression) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| [Word-level language model](https://github.com/NervanaSystems/distiller/tree/master/examples/word_language_model)| :white_check_mark: | :white_check_mark: | | |
| [Translation (GNMT)](https://github.com/NervanaSystems/distiller/tree/master/examples/GNMT) | | :white_check_mark: | | |
| [Recommendation System (NCF)](https://github.com/NervanaSystems/distiller/tree/master/examples/ncf) | | :white_check_mark: | | |
| [Object Detection](https://github.com/NervanaSystems/distiller/tree/master/examples/object_detection_compression) | :white_check_mark: | | | |
We'll show you how to use it for some simple use-cases, and will point you to some ready-to-go Jupyter notebooks. Head to the [examples](https://github.com/NervanaSystems/distiller/tree/master/examples) directory for more details.
For more details, there are some other resources you can refer to: Other resources to refer to, beyond the examples:
+ [Frequently-asked questions (FAQ)](https://github.com/NervanaSystems/distiller/wiki/Frequently-Asked-Questions-(FAQ)) + [Frequently-asked questions (FAQ)](https://github.com/NervanaSystems/distiller/wiki/Frequently-Asked-Questions-(FAQ))
+ [Model zoo](https://nervanasystems.github.io/distiller/model_zoo.html) + [Model zoo](https://nervanasystems.github.io/distiller/model_zoo.html)
+ [Compression scheduling](https://nervanasystems.github.io/distiller/schedule.html) + [Compression scheduling](https://nervanasystems.github.io/distiller/schedule.html)
+ [Usage](https://nervanasystems.github.io/distiller/usage.html) + [Usage](https://nervanasystems.github.io/distiller/usage.html)
+ [Preparing a model for quantization](https://nervanasystems.github.io/distiller/prepare_model_quant.html) + [Preparing a model for quantization](https://nervanasystems.github.io/distiller/prepare_model_quant.html)
+ [Tutorial: Using Distiller to prune a PyTorch language model](https://nervanasystems.github.io/distiller/tutorial-lang_model.html)
+ [Tutorial: Pruning Filters & Channels](https://nervanasystems.github.io/distiller/tutorial-struct_pruning.html) + [Tutorial: Pruning Filters & Channels](https://nervanasystems.github.io/distiller/tutorial-struct_pruning.html)
+ [Tutorial: Post-Training Quantization of a Language Model](https://nervanasystems.github.io/distiller/tutorial-lang_model_quant.html)
+ [Tutorial: Post-Training Quantization of GNMT (translation model)](https://nervanasystems.github.io/distiller/tutorial-gnmt_quant.html)
+ [Post-training quantization command line examples](https://github.com/NervanaSystems/distiller/blob/master/examples/quantization/post_train_quant/command_line.md)
### Example invocations of the sample application ### Basic Usage Examples
The following are simple examples using Distiller's image classifcation sample, showing some of Distiller's capabilities.
+ [Training-only](#training-only) + [Training-only](#training-only)
+ [Getting parameter statistics of a sparsified model](#getting-parameter-statistics-of-a-sparsified-model) + [Getting parameter statistics of a sparsified model](#getting-parameter-statistics-of-a-sparsified-model)
+ [Post-training quantization](#post-training-quantization) + [Post-training quantization](#post-training-quantization)
...@@ -217,7 +224,6 @@ $ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p ...@@ -217,7 +224,6 @@ $ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p
You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss. You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss.
<center> <img src="imgs/simplenet_training.png"></center> <center> <img src="imgs/simplenet_training.png"></center>
#### Getting parameter statistics of a sparsified model #### Getting parameter statistics of a sparsified model
We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.<br> We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.<br>
With the following command-line arguments, the sample application loads the model (```--resume```) and prints statistics about the model weights (```--summary=sparsity```). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you *resume* a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (```-a=resnet20_cifar```): With the following command-line arguments, the sample application loads the model (```--resume```) and prints statistics about the model weights (```--summary=sparsity```). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you *resume* a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (```-a=resnet20_cifar```):
...@@ -257,34 +263,6 @@ After installing and running the server, take a look at the [notebook](https://g ...@@ -257,34 +263,6 @@ After installing and running the server, take a look at the [notebook](https://g
Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis. Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis.
<center> <img src="imgs/resnet18-sensitivity.png"></center> <center> <img src="imgs/resnet18-sensitivity.png"></center>
## Set up the classification datasets
The sample application for compressing image classification DNNs, ```compress_classifier.py``` located at ```distiller/examples/classifier_compression```, uses both [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) and [ImageNet](http://www.image-net.org/) image datasets.<br>
The ```compress_classifier.py``` application will download the CIFAR10 automatically the first time you try to use it (thanks to TorchVision). The example invocations used throughout Distiller's documentation assume that you have downloaded the images to directory ```distiller/../data.cifar10```, but you can place the images anywhere you want (you tell ```compress_classifier.py``` where the dataset is located - or where you want the application to download the dataset to - using a command-line parameter).
ImageNet needs to be [downloaded](http://image-net.org/download) manually, due to copyright issues. Facebook has created a [set of scripts](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset) to help download and extract the dataset.
Again, the Distiller documentation assumes the following directory structure for the datasets, but this is just a suggestion:
```
distiller
examples
classifier_compression
data.imagenet/
train/
val/
data.cifar10/
cifar-10-batches-py/
batches.meta
data_batch_1
data_batch_2
data_batch_3
data_batch_4
data_batch_5
readme.html
test_batch
```
## Running the tests ## Running the tests
We are currently light-weight on test and this is an area where contributions will be much appreciated.<br> We are currently light-weight on test and this is an area where contributions will be much appreciated.<br>
There are two types of tests: system tests and unit-tests. To invoke the unit tests: There are two types of tests: system tests and unit-tests. To invoke the unit tests:
......
...@@ -4,7 +4,18 @@ and show different configurations of quantization to achieve the highest accurac ...@@ -4,7 +4,18 @@ and show different configurations of quantization to achieve the highest accurac
Note that this folder contains only code required to run evaluation. All training code was removed. A link to a pre-trained model is provided below. Note that this folder contains only code required to run evaluation. All training code was removed. A link to a pre-trained model is provided below.
For a summary on the quantization results see [below](#results). ## Summary of Post-Training Quantization Results
| Precision | Mode | Per-Channel | Clip Activations | Bleu Score |
|-----------|------------|-------------|---------------------------------------------------------------|------------|
| FP32 | N/A | N/A | N/A | 22.16 |
| INT8 | Symmetric | No | No | 18.05 |
| INT8 | Asymmetric | No | No | 18.52 |
| INT8 | Asymmetric | Yes | AVG in all layers | 9.63 |
| INT8 | Asymmetric | Yes | AVG in all layers except attention block | 16.94 |
| INT8 | Asymmetric | Yes | AVG in all layers except attention block and final classifier | 21.49 |
For details on how the model is being quantized, see [below](#what-is-quantized).
## Running the Example ## Running the Example
...@@ -61,17 +72,6 @@ The following operations do not have a quantized implementation. The operations ...@@ -61,17 +72,6 @@ The following operations do not have a quantized implementation. The operations
quant_dequant(y) quant_dequant(y)
``` ```
### Results
| Precision | Mode | Per-Channel | Clip Activations | Bleu Score |
|-----------|------------|-------------|---------------------------------------------------------------|------------|
| FP32 | N/A | N/A | N/A | 22.16 |
| INT8 | Symmetric | No | No | 18.05 |
| INT8 | Asymmetric | No | No | 18.52 |
| INT8 | Asymmetric | Yes | AVG in all layers | 9.63 |
| INT8 | Asymmetric | Yes | AVG in all layers except attention block | 16.94 |
| INT8 | Asymmetric | Yes | AVG in all layers except attention block and final classifier | 21.49 |
## Dataset / Environment ## Dataset / Environment
### Publication / Attribution ### Publication / Attribution
......
# Compression Examples
Distiller comes with sample applications and tutorials covering a range of model types:
| Model Type | Sparsity | Post-train quant | Quant-aware training | Auto Compression (AMC) | In Directories |
|------------|:--------:|:----------------:|:--------------------:|:----------------------:|----------------|
| **Image classification** | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | [classifier_compression](https://github.com/NervanaSystems/distiller/tree/master/examples/classifier_compression), [auto_compression/amc](https://github.com/NervanaSystems/distiller/tree/master/examples/auto_compression/amc) |
| **Word-level language model** | :white_check_mark: | :white_check_mark: | | |[word_language_model](https://github.com/NervanaSystems/distiller/tree/master/examples/word_language_model) |
| **Translation (GNMT)** | | :white_check_mark: | | | [GNMT](https://github.com/NervanaSystems/distiller/tree/master/examples/GNMT) |
| **Recommendation System (NCF)** | | :white_check_mark: | | | [ncf](https://github.com/NervanaSystems/distiller/tree/master/examples/ncf) |
| **Object Detection** | :white_check_mark: | | | | [object_detection_compression](https://github.com/NervanaSystems/distiller/tree/master/examples/object_detection_compression) |
The directories specified in the table contain the code implementing each of the modalities. The rest of the sub-directories in this directory are each dedicated to a specific compression method, and contain YAML schedules and other files that can be used with the sample applications. Most of these files contain details on the results obtained and how to re-produce them.
# Image Classifiers Compression
This is Distiller's main example application for compressing image classification models.
- [Image Classifiers Compression](#image-classifiers-compression)
- [Usage](#usage)
- [Compression Methods](#compression-methods)
- [Sparsity - Pruning and Regularization](#sparsity---pruning-and-regularization)
- [Quantization](#quantization)
- [Knowledge Distillation](#knowledge-distillation)
- [Models Supported](#models-supported)
- [Datasets Supported](#datasets-supported)
- [Re-usable Image Classification Code](#re-usable-image-classification-code)
## Usage
Please see the [docs](https://nervanasystems.github.io/distiller/usage.html) for usage details. In addition, run `compress_classifier.py -h` to show the extensive list of command-line options available.
## Compression Methods
**Follow the links for more details on each method and experiment results.**
### Sparsity - Pruning and Regularization
A non-exhaustive list of the methods implemented:
- [AGP](https://github.com/NervanaSystems/distiller/tree/master/examples/agp-pruning)
- [DropFilter](https://github.com/NervanaSystems/distiller/tree/master/examples/drop_filter)
- [Lottery-Ticket Hypothesis](https://github.com/NervanaSystems/distiller/tree/master/examples/lottery_ticket)
- [Network Surgery](https://github.com/NervanaSystems/distiller/tree/master/examples/network_surgery)
- [Network Trimming](https://github.com/NervanaSystems/distiller/tree/master/examples/network_trimming)
- [Hybrids](https://github.com/NervanaSystems/distiller/tree/master/examples/hybrid): These are examples where multiple pruning strategies are combined.
### Quantization
- [Post-training quantization](https://github.com/NervanaSystems/distiller/blob/update_readmes/examples/quantization/post_train_quant/command_line.md) based on the TensorFlow quantization scheme (originally GEMMLOWP) with additional capabilities.
- [Quantization-aware training](https://github.com/NervanaSystems/distiller/tree/master/examples/quantization/quant_aware_train): TensorFlow scheme, DoReFa, PACT
### Knowledge Distillation
See details in the [docs](https://nervanasystems.github.io/distiller/schedule.html#knowledge-distillation), and these YAML schedules training ResNet on CIFAR-10 with knowledge distillation: [FP32](https://github.com/NervanaSystems/distiller/blob/update_readmes/examples/quantization/fp32_baselines/preact_resnet_cifar_base_fp32.yaml) ; [DoReFa](https://github.com/NervanaSystems/distiller/blob/update_readmes/examples/quantization/quant_aware_train/preact_resnet_cifar_dorefa.yaml).
## Models Supported
The sample app integrates with [TorchVision](https://pytorch.org/docs/master/torchvision/models.html#classification) and [Cadene's pre-trained models](https://github.com/Cadene/pretrained-models.pytorch). Barring specific issues, any model from these two repositories can be specified from the command line and used.
We've implemented additional models, which can be found [here](https://github.com/NervanaSystems/distiller/tree/master/distiller/models).
## Datasets Supported
The application supports ImageNet, CIFAR-10 and MNIST.
The `compress_classifier.py` application will download the CIFAR-10 and MNIST datasets automatically the first time you try to use them (thanks to TorchVision). The example invocations used throughout Distiller's documentation assume that you have downloaded the images to directory `distiller/../data.cifar10`, but you can place the images anywhere you want (you tell `compress_classifier.py` where the dataset is located - or where you want the application to download the dataset to - using a command-line parameter).
ImageNet needs to be [downloaded](http://image-net.org/download) manually, due to copyright issues. Facebook has created a [set of scripts](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset) to help download and extract the dataset.
Again, the Distiller documentation assumes the following directory structure for the datasets, but this is just a suggestion:
```
distiller
examples
classifier_compression
data.imagenet/
train/
val/
data.cifar10/
cifar-10-batches-py/
batches.meta
data_batch_1
data_batch_2
data_batch_3
data_batch_4
data_batch_5
readme.html
test_batch
data.mnist/
MNIST/
processed/
raw/
```
## Re-usable Image Classification Code
We borrow the main flow code from PyTorch's ImageNet classification training sample application ([see here](https://github.com/pytorch/examples/tree/master/imagenet)). Much of the flow was refactored into a class called `ClassifierCompressor`, which can be re-used to build different scripts that perform image classifiers compression. Its implementation can be found in [`distiller/apputils/image_classifier.py`](https://github.com/NervanaSystems/distiller/blob/update_readmes/distiller/apputils/image_classifier.py).
The [AMC auto-compression](https://github.com/NervanaSystems/distiller/tree/master/examples/auto_compression/amc) sample is another application that uses this building block.
...@@ -23,6 +23,16 @@ The sample command lines provided [below](#running-the-sample) focus on **post-t ...@@ -23,6 +23,16 @@ The sample command lines provided [below](#running-the-sample) focus on **post-t
This task benchmarks recommendation with implicit feedback on the [MovieLens 20 Million (ml-20m) dataset](https://grouplens.org/datasets/movielens/20m/) with a [Neural Collaborative Filtering](http://dl.acm.org/citation.cfm?id=3052569) model. This task benchmarks recommendation with implicit feedback on the [MovieLens 20 Million (ml-20m) dataset](https://grouplens.org/datasets/movielens/20m/) with a [Neural Collaborative Filtering](http://dl.acm.org/citation.cfm?id=3052569) model.
The model trains on binary information about whether or not a user interacted with a specific item. The model trains on binary information about whether or not a user interacted with a specific item.
## Summary of Post-Training Quantization Results
| Precision | Mode | Per-Channel | Split Final Layer | HR@10 |
|-----------|------------|-------------|-------------------|-------|
| FP32 | N/A | N/A | N/A | 63.55 |
| INT8 | Asymmetric | Yes | No | 49.54 |
| INT8 | Asymmetric | Yes | Yes | 62.78 |
Details on how to run the experiments, including what we mean by "split final layer" are [below](#running-the-sample).
## Setup ## Setup
* Install `unzip` and `curl` * Install `unzip` and `curl`
......
# Word-level language modeling RNN # Word-level language modeling RNN
This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task.
It is based on the PyTorch example found [here](https://github.com/pytorch/examples/tree/master/word_language_model). Note that we're using an earlier version, that doesn't include the Transformer implementation.
- [Word-level language modeling RNN](#word-level-language-modeling-rnn)
- [Running the example](#running-the-example)
- [Compression: Pruning](#compression-pruning)
- [Compression: Post-Training Quantization](#compression-post-training-quantization)
## Running the example
By default, the training script uses the Wikitext-2 dataset, provided. By default, the training script uses the Wikitext-2 dataset, provided.
The trained model can then be used by the generate script to generate new text. The trained model can then be used by the generate script to generate new text.
...@@ -45,12 +54,61 @@ With these arguments, a variety of models can be tested. ...@@ -45,12 +54,61 @@ With these arguments, a variety of models can be tested.
As an example, the following arguments produce slower but better models: As an example, the following arguments produce slower but better models:
```bash ```bash
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 # Test perplexity of 80.97 python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied # Test perplexity of 75.96 python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 # Test perplexity of 77.42 python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied # Test perplexity of 72.30 python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied
``` ```
Perplexities on PTB are equal or better than Perplexities on PTB are equal or better than
[Recurrent Neural Network Regularization (Zaremba et al. 2014)](https://arxiv.org/pdf/1409.2329.pdf) [Recurrent Neural Network Regularization (Zaremba et al. 2014)](https://arxiv.org/pdf/1409.2329.pdf)
and are similar to [Using the Output Embedding to Improve Language Models (Press & Wolf 2016](https://arxiv.org/abs/1608.05859) and [Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf), though both of these papers have improved perplexities by using a form of recurrent dropout [(variational dropout)](http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks). and are similar to [Using the Output Embedding to Improve Language Models (Press & Wolf 2016](https://arxiv.org/abs/1608.05859) and [Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf), though both of these papers have improved perplexities by using a form of recurrent dropout [(variational dropout)](http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks).
## Compression: Pruning
[**Tutorial: Using Distiller to prune a PyTorch language model**](https://nervanasystems.github.io/distiller/tutorial-lang_model.html)
We modified the `main.py` script to allow pruning via Distiller's scheduling mechanism. The tutorial linked above provides a step-by-step description of how these modifications were done. It then shows how to use [AGP (automated gradual pruning)](https://arxiv.org/abs/1710.01878) to prune the model to various levels.
The following table summarizes the pruning results obtained in the tutorial. The parameters used here are:
`--emsize 650 --nhid 1500 --dropout 0.65 --tied --wd 1e-6`
| Sparsity | # Non-zero parameters | Validation ppl | Test ppl |
|---------------|:---------------------:|:--------------:|:--------:|
| Baseline - 0% | 85,917,000 | 87.49 | 83.85 |
| 70% | 25,487,550 | 90.67 | 85.96 |
| 70% | 25,487,550 | 90.59 | 85.84 |
| 70% | 25,487,550 | 87.40 | 82.93 |
| **80.4%** | **16,847,550** | **89.31** | **83.64**|
| 90% | 8,591,700 | 90.70 | 85.67 |
| 95% | 4,295,850 | 98.42 | 92.79 |
We can see that we are able to maintain the original perplexity using only ~20% of the parameters.
## Compression: Post-Training Quantization
[**Tutorial: Post-Training Quantization of a Language Model using Distiller** (Jupyter Notebook)](https://github.com/NervanaSystems/distiller/blob/master/examples/word_language_model/quantize_lstm.ipynb)
(Note that post-training quantization is NOT implemented in the `main.py` script - it is only shown in the notebook tutorial)
The tutorial covers the following:
* Converting the model to use Distiller's modular LSTM implementation, which allows flexible quantization of internal LSTM operations.
* Collecting activation statistics prior to quantization
* Creating a `PostTrainLinearQuantizer` and preparing the model for quantization
* "Net-aware quantization" capability of `PostTrainLinearQuantizer`
* Progressively tweaking the quantization settings in order to improve accuracy
The following table summarizes the post-training quantization experiments shown in the tutorial:
| Precision | INT8: Mode | INT8: Per-channel | INT8: Clipping | FP16 Modules | Test ppl |
|-----------------|------------|-------------------|----------------|------------------------------------|----------|
| FP32 | N/A | N/A | N/A | N/A | 86.87 |
| Full FP16 | N/A | N/A | N/A | Entire Model | 86.80 |
| Full INT8 | Symmetric | No | No | None | 104.2 |
| Full INT8 | Asymmetric | Yes | No | None | 100.45 |
| Full INT8 | Asymmetric | Yes | Averaging | None | 88.85 |
| Mixed INT8/FP16 | Asymmetric | Yes | No | Encoder, decoder, Eltwise add/mult | 86.77 |
| Mixed INT8/FP16 | Asymmetric | Yes | Averaging | Encoder, decoder | 88.96 |
For more details see the tutorial itself.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment