diff --git a/README.md b/README.md index d29e34bdac37cea6c3e4ca16cb7413f908a2b38f..fbac8686e23bd75d6c1af13066facf08a98d7b30 100755 --- a/README.md +++ b/README.md @@ -52,19 +52,21 @@ If updating from an earlier revision of the code, please make sure to follow the - [Using venv](#using-venv) - [Activate the environment](#activate-the-environment) - [Install the package](#install-the-package) + - [Required PyTorch Version](#required-pytorch-version) - [Getting Started](#getting-started) - - [Example invocations of the sample application](#example-invocations-of-the-sample-application) + - [Basic Usage Examples](#basic-usage-examples) - [Training-only](#training-only) - [Getting parameter statistics of a sparsified model](#getting-parameter-statistics-of-a-sparsified-model) - [Post-training quantization](#post-training-quantization) - [Explore the sample Jupyter notebooks](#explore-the-sample-jupyter-notebooks) -- [Set up the classification datasets](#set-up-the-classification-datasets) - [Running the tests](#running-the-tests) - [Generating the HTML documentation site](#generating-the-html-documentation-site) - [Built With](#built-with) - [Versioning](#versioning) - [License](#license) - [Community](#community) + - [Github projects using Distiller:](#github-projects-using-distiller) + - [Research papers citing Distiller:](#research-papers-citing-distiller) - [Acknowledgments](#acknowledgments) - [Disclaimer](#disclaimer) @@ -179,25 +181,30 @@ If you do not use CUDA 10.1 in your environment, please refer to [PyTorch websit ## Getting Started -You can jump head-first into some limited examples of network compression, to get a feeling for the library without too much investment on your part. +Distiller comes with sample applications and tutorials covering a range of model types: -Distiller comes with a sample application for compressing image classification DNNs, ```compress_classifier.py``` located at ```distiller/examples/classifier_compression```. +| Model Type | Sparsity | Post-train quant | Quant-aware training | Auto Compression (AMC) | +|------------|:--------:|:----------------:|:--------------------:|:----------------------:| +| [Image classification](https://github.com/NervanaSystems/distiller/tree/master/examples/classifier_compression) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +| [Word-level language model](https://github.com/NervanaSystems/distiller/tree/master/examples/word_language_model)| :white_check_mark: | :white_check_mark: | | | +| [Translation (GNMT)](https://github.com/NervanaSystems/distiller/tree/master/examples/GNMT) | | :white_check_mark: | | | +| [Recommendation System (NCF)](https://github.com/NervanaSystems/distiller/tree/master/examples/ncf) | | :white_check_mark: | | | +| [Object Detection](https://github.com/NervanaSystems/distiller/tree/master/examples/object_detection_compression) | :white_check_mark: | | | | -We'll show you how to use it for some simple use-cases, and will point you to some ready-to-go Jupyter notebooks. +Head to the [examples](https://github.com/NervanaSystems/distiller/tree/master/examples) directory for more details. -For more details, there are some other resources you can refer to: +Other resources to refer to, beyond the examples: + [Frequently-asked questions (FAQ)](https://github.com/NervanaSystems/distiller/wiki/Frequently-Asked-Questions-(FAQ)) + [Model zoo](https://nervanasystems.github.io/distiller/model_zoo.html) + [Compression scheduling](https://nervanasystems.github.io/distiller/schedule.html) + [Usage](https://nervanasystems.github.io/distiller/usage.html) + [Preparing a model for quantization](https://nervanasystems.github.io/distiller/prepare_model_quant.html) -+ [Tutorial: Using Distiller to prune a PyTorch language model](https://nervanasystems.github.io/distiller/tutorial-lang_model.html) + [Tutorial: Pruning Filters & Channels](https://nervanasystems.github.io/distiller/tutorial-struct_pruning.html) -+ [Tutorial: Post-Training Quantization of a Language Model](https://nervanasystems.github.io/distiller/tutorial-lang_model_quant.html) -+ [Tutorial: Post-Training Quantization of GNMT (translation model)](https://nervanasystems.github.io/distiller/tutorial-gnmt_quant.html) -+ [Post-training quantization command line examples](https://github.com/NervanaSystems/distiller/blob/master/examples/quantization/post_train_quant/command_line.md) -### Example invocations of the sample application +### Basic Usage Examples + +The following are simple examples using Distiller's image classifcation sample, showing some of Distiller's capabilities. + + [Training-only](#training-only) + [Getting parameter statistics of a sparsified model](#getting-parameter-statistics-of-a-sparsified-model) + [Post-training quantization](#post-training-quantization) @@ -217,7 +224,6 @@ $ python3 compress_classifier.py --arch simplenet_cifar ../../../data.cifar10 -p You can use a TensorBoard backend to view the training progress (in the diagram below we show a couple of training sessions with different LR values). For compression sessions, we've added tracing of activation and parameter sparsity levels, and regularization loss. <center> <img src="imgs/simplenet_training.png"></center> - #### Getting parameter statistics of a sparsified model We've included in the git repository a few checkpoints of a ResNet20 model that we've trained with 32-bit floats. Let's load the checkpoint of a model that we've trained with channel-wise Group Lasso regularization.<br> With the following command-line arguments, the sample application loads the model (```--resume```) and prints statistics about the model weights (```--summary=sparsity```). This is useful if you want to load a previously pruned model, to examine the weights sparsity statistics, for example. Note that when you *resume* a stored checkpoint, you still need to tell the application which network architecture the checkpoint uses (```-a=resnet20_cifar```): @@ -257,34 +263,6 @@ After installing and running the server, take a look at the [notebook](https://g Sensitivity analysis is a long process and this notebook loads CSV files that are the output of several sessions of sensitivity analysis. <center> <img src="imgs/resnet18-sensitivity.png"></center> -## Set up the classification datasets -The sample application for compressing image classification DNNs, ```compress_classifier.py``` located at ```distiller/examples/classifier_compression```, uses both [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) and [ImageNet](http://www.image-net.org/) image datasets.<br> - -The ```compress_classifier.py``` application will download the CIFAR10 automatically the first time you try to use it (thanks to TorchVision). The example invocations used throughout Distiller's documentation assume that you have downloaded the images to directory ```distiller/../data.cifar10```, but you can place the images anywhere you want (you tell ```compress_classifier.py``` where the dataset is located - or where you want the application to download the dataset to - using a command-line parameter). - -ImageNet needs to be [downloaded](http://image-net.org/download) manually, due to copyright issues. Facebook has created a [set of scripts](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset) to help download and extract the dataset. - -Again, the Distiller documentation assumes the following directory structure for the datasets, but this is just a suggestion: -``` -distiller - examples - classifier_compression -data.imagenet/ - train/ - val/ -data.cifar10/ - cifar-10-batches-py/ - batches.meta - data_batch_1 - data_batch_2 - data_batch_3 - data_batch_4 - data_batch_5 - readme.html - test_batch -``` - - ## Running the tests We are currently light-weight on test and this is an area where contributions will be much appreciated.<br> There are two types of tests: system tests and unit-tests. To invoke the unit tests: diff --git a/examples/GNMT/README.md b/examples/GNMT/README.md index 893ce1f3f3554aa64cf07f1791beb22efd64a457..0ce9e433edd66c41eb6d76e56d29803cac1f9de0 100644 --- a/examples/GNMT/README.md +++ b/examples/GNMT/README.md @@ -4,7 +4,18 @@ and show different configurations of quantization to achieve the highest accurac Note that this folder contains only code required to run evaluation. All training code was removed. A link to a pre-trained model is provided below. -For a summary on the quantization results see [below](#results). +## Summary of Post-Training Quantization Results + +| Precision | Mode | Per-Channel | Clip Activations | Bleu Score | +|-----------|------------|-------------|---------------------------------------------------------------|------------| +| FP32 | N/A | N/A | N/A | 22.16 | +| INT8 | Symmetric | No | No | 18.05 | +| INT8 | Asymmetric | No | No | 18.52 | +| INT8 | Asymmetric | Yes | AVG in all layers | 9.63 | +| INT8 | Asymmetric | Yes | AVG in all layers except attention block | 16.94 | +| INT8 | Asymmetric | Yes | AVG in all layers except attention block and final classifier | 21.49 | + +For details on how the model is being quantized, see [below](#what-is-quantized). ## Running the Example @@ -61,17 +72,6 @@ The following operations do not have a quantized implementation. The operations quant_dequant(y) ``` -### Results - -| Precision | Mode | Per-Channel | Clip Activations | Bleu Score | -|-----------|------------|-------------|---------------------------------------------------------------|------------| -| FP32 | N/A | N/A | N/A | 22.16 | -| INT8 | Symmetric | No | No | 18.05 | -| INT8 | Asymmetric | No | No | 18.52 | -| INT8 | Asymmetric | Yes | AVG in all layers | 9.63 | -| INT8 | Asymmetric | Yes | AVG in all layers except attention block | 16.94 | -| INT8 | Asymmetric | Yes | AVG in all layers except attention block and final classifier | 21.49 | - ## Dataset / Environment ### Publication / Attribution diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6735743d78c1a1f491c1eaced1a274e129715b12 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,13 @@ +# Compression Examples + +Distiller comes with sample applications and tutorials covering a range of model types: + +| Model Type | Sparsity | Post-train quant | Quant-aware training | Auto Compression (AMC) | In Directories | +|------------|:--------:|:----------------:|:--------------------:|:----------------------:|----------------| +| **Image classification** | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | [classifier_compression](https://github.com/NervanaSystems/distiller/tree/master/examples/classifier_compression), [auto_compression/amc](https://github.com/NervanaSystems/distiller/tree/master/examples/auto_compression/amc) | +| **Word-level language model** | :white_check_mark: | :white_check_mark: | | |[word_language_model](https://github.com/NervanaSystems/distiller/tree/master/examples/word_language_model) | +| **Translation (GNMT)** | | :white_check_mark: | | | [GNMT](https://github.com/NervanaSystems/distiller/tree/master/examples/GNMT) | +| **Recommendation System (NCF)** | | :white_check_mark: | | | [ncf](https://github.com/NervanaSystems/distiller/tree/master/examples/ncf) | +| **Object Detection** | :white_check_mark: | | | | [object_detection_compression](https://github.com/NervanaSystems/distiller/tree/master/examples/object_detection_compression) | + +The directories specified in the table contain the code implementing each of the modalities. The rest of the sub-directories in this directory are each dedicated to a specific compression method, and contain YAML schedules and other files that can be used with the sample applications. Most of these files contain details on the results obtained and how to re-produce them. diff --git a/examples/classifier_compression/README.md b/examples/classifier_compression/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3eae671630e31a2af2562db41e66b946c27d6c71 --- /dev/null +++ b/examples/classifier_compression/README.md @@ -0,0 +1,85 @@ +# Image Classifiers Compression + +This is Distiller's main example application for compressing image classification models. + +- [Image Classifiers Compression](#image-classifiers-compression) + - [Usage](#usage) + - [Compression Methods](#compression-methods) + - [Sparsity - Pruning and Regularization](#sparsity---pruning-and-regularization) + - [Quantization](#quantization) + - [Knowledge Distillation](#knowledge-distillation) + - [Models Supported](#models-supported) + - [Datasets Supported](#datasets-supported) + - [Re-usable Image Classification Code](#re-usable-image-classification-code) + +## Usage + +Please see the [docs](https://nervanasystems.github.io/distiller/usage.html) for usage details. In addition, run `compress_classifier.py -h` to show the extensive list of command-line options available. + +## Compression Methods + +**Follow the links for more details on each method and experiment results.** + +### Sparsity - Pruning and Regularization + +A non-exhaustive list of the methods implemented: + +- [AGP](https://github.com/NervanaSystems/distiller/tree/master/examples/agp-pruning) +- [DropFilter](https://github.com/NervanaSystems/distiller/tree/master/examples/drop_filter) +- [Lottery-Ticket Hypothesis](https://github.com/NervanaSystems/distiller/tree/master/examples/lottery_ticket) +- [Network Surgery](https://github.com/NervanaSystems/distiller/tree/master/examples/network_surgery) +- [Network Trimming](https://github.com/NervanaSystems/distiller/tree/master/examples/network_trimming) +- [Hybrids](https://github.com/NervanaSystems/distiller/tree/master/examples/hybrid): These are examples where multiple pruning strategies are combined. + +### Quantization + +- [Post-training quantization](https://github.com/NervanaSystems/distiller/blob/update_readmes/examples/quantization/post_train_quant/command_line.md) based on the TensorFlow quantization scheme (originally GEMMLOWP) with additional capabilities. +- [Quantization-aware training](https://github.com/NervanaSystems/distiller/tree/master/examples/quantization/quant_aware_train): TensorFlow scheme, DoReFa, PACT + +### Knowledge Distillation + +See details in the [docs](https://nervanasystems.github.io/distiller/schedule.html#knowledge-distillation), and these YAML schedules training ResNet on CIFAR-10 with knowledge distillation: [FP32](https://github.com/NervanaSystems/distiller/blob/update_readmes/examples/quantization/fp32_baselines/preact_resnet_cifar_base_fp32.yaml) ; [DoReFa](https://github.com/NervanaSystems/distiller/blob/update_readmes/examples/quantization/quant_aware_train/preact_resnet_cifar_dorefa.yaml). + +## Models Supported + +The sample app integrates with [TorchVision](https://pytorch.org/docs/master/torchvision/models.html#classification) and [Cadene's pre-trained models](https://github.com/Cadene/pretrained-models.pytorch). Barring specific issues, any model from these two repositories can be specified from the command line and used. + +We've implemented additional models, which can be found [here](https://github.com/NervanaSystems/distiller/tree/master/distiller/models). + +## Datasets Supported + +The application supports ImageNet, CIFAR-10 and MNIST. + +The `compress_classifier.py` application will download the CIFAR-10 and MNIST datasets automatically the first time you try to use them (thanks to TorchVision). The example invocations used throughout Distiller's documentation assume that you have downloaded the images to directory `distiller/../data.cifar10`, but you can place the images anywhere you want (you tell `compress_classifier.py` where the dataset is located - or where you want the application to download the dataset to - using a command-line parameter). + +ImageNet needs to be [downloaded](http://image-net.org/download) manually, due to copyright issues. Facebook has created a [set of scripts](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset) to help download and extract the dataset. + +Again, the Distiller documentation assumes the following directory structure for the datasets, but this is just a suggestion: +``` +distiller + examples + classifier_compression +data.imagenet/ + train/ + val/ +data.cifar10/ + cifar-10-batches-py/ + batches.meta + data_batch_1 + data_batch_2 + data_batch_3 + data_batch_4 + data_batch_5 + readme.html + test_batch +data.mnist/ + MNIST/ + processed/ + raw/ +``` + +## Re-usable Image Classification Code + +We borrow the main flow code from PyTorch's ImageNet classification training sample application ([see here](https://github.com/pytorch/examples/tree/master/imagenet)). Much of the flow was refactored into a class called `ClassifierCompressor`, which can be re-used to build different scripts that perform image classifiers compression. Its implementation can be found in [`distiller/apputils/image_classifier.py`](https://github.com/NervanaSystems/distiller/blob/update_readmes/distiller/apputils/image_classifier.py). + +The [AMC auto-compression](https://github.com/NervanaSystems/distiller/tree/master/examples/auto_compression/amc) sample is another application that uses this building block. diff --git a/examples/ncf/README.md b/examples/ncf/README.md index e8286a223714567fe2d01a38c151a69af2de927e..f7b2ad8c0ec2cb313ee3adc44e24882b9b118e59 100644 --- a/examples/ncf/README.md +++ b/examples/ncf/README.md @@ -23,6 +23,16 @@ The sample command lines provided [below](#running-the-sample) focus on **post-t This task benchmarks recommendation with implicit feedback on the [MovieLens 20 Million (ml-20m) dataset](https://grouplens.org/datasets/movielens/20m/) with a [Neural Collaborative Filtering](http://dl.acm.org/citation.cfm?id=3052569) model. The model trains on binary information about whether or not a user interacted with a specific item. +## Summary of Post-Training Quantization Results + +| Precision | Mode | Per-Channel | Split Final Layer | HR@10 | +|-----------|------------|-------------|-------------------|-------| +| FP32 | N/A | N/A | N/A | 63.55 | +| INT8 | Asymmetric | Yes | No | 49.54 | +| INT8 | Asymmetric | Yes | Yes | 62.78 | + +Details on how to run the experiments, including what we mean by "split final layer" are [below](#running-the-sample). + ## Setup * Install `unzip` and `curl` diff --git a/examples/word_language_model/README.md b/examples/word_language_model/README.md index 7184ddbacb74041959f6a677e680bf773e27fb8e..c8d18332f48a26edd9b9b0c25d4701c7a117af2b 100644 --- a/examples/word_language_model/README.md +++ b/examples/word_language_model/README.md @@ -1,6 +1,15 @@ # Word-level language modeling RNN This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. +It is based on the PyTorch example found [here](https://github.com/pytorch/examples/tree/master/word_language_model). Note that we're using an earlier version, that doesn't include the Transformer implementation. + +- [Word-level language modeling RNN](#word-level-language-modeling-rnn) + - [Running the example](#running-the-example) + - [Compression: Pruning](#compression-pruning) + - [Compression: Post-Training Quantization](#compression-post-training-quantization) + +## Running the example + By default, the training script uses the Wikitext-2 dataset, provided. The trained model can then be used by the generate script to generate new text. @@ -45,12 +54,61 @@ With these arguments, a variety of models can be tested. As an example, the following arguments produce slower but better models: ```bash -python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 # Test perplexity of 80.97 -python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied # Test perplexity of 75.96 -python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 # Test perplexity of 77.42 -python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied # Test perplexity of 72.30 +python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 +python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied +python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 +python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied ``` Perplexities on PTB are equal or better than [Recurrent Neural Network Regularization (Zaremba et al. 2014)](https://arxiv.org/pdf/1409.2329.pdf) and are similar to [Using the Output Embedding to Improve Language Models (Press & Wolf 2016](https://arxiv.org/abs/1608.05859) and [Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016)](https://arxiv.org/pdf/1611.01462.pdf), though both of these papers have improved perplexities by using a form of recurrent dropout [(variational dropout)](http://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks). + +## Compression: Pruning + +[**Tutorial: Using Distiller to prune a PyTorch language model**](https://nervanasystems.github.io/distiller/tutorial-lang_model.html) + +We modified the `main.py` script to allow pruning via Distiller's scheduling mechanism. The tutorial linked above provides a step-by-step description of how these modifications were done. It then shows how to use [AGP (automated gradual pruning)](https://arxiv.org/abs/1710.01878) to prune the model to various levels. + +The following table summarizes the pruning results obtained in the tutorial. The parameters used here are: +`--emsize 650 --nhid 1500 --dropout 0.65 --tied --wd 1e-6` + +| Sparsity | # Non-zero parameters | Validation ppl | Test ppl | +|---------------|:---------------------:|:--------------:|:--------:| +| Baseline - 0% | 85,917,000 | 87.49 | 83.85 | +| 70% | 25,487,550 | 90.67 | 85.96 | +| 70% | 25,487,550 | 90.59 | 85.84 | +| 70% | 25,487,550 | 87.40 | 82.93 | +| **80.4%** | **16,847,550** | **89.31** | **83.64**| +| 90% | 8,591,700 | 90.70 | 85.67 | +| 95% | 4,295,850 | 98.42 | 92.79 | + +We can see that we are able to maintain the original perplexity using only ~20% of the parameters. + +## Compression: Post-Training Quantization + +[**Tutorial: Post-Training Quantization of a Language Model using Distiller** (Jupyter Notebook)](https://github.com/NervanaSystems/distiller/blob/master/examples/word_language_model/quantize_lstm.ipynb) + +(Note that post-training quantization is NOT implemented in the `main.py` script - it is only shown in the notebook tutorial) + +The tutorial covers the following: + +* Converting the model to use Distiller's modular LSTM implementation, which allows flexible quantization of internal LSTM operations. +* Collecting activation statistics prior to quantization +* Creating a `PostTrainLinearQuantizer` and preparing the model for quantization +* "Net-aware quantization" capability of `PostTrainLinearQuantizer` +* Progressively tweaking the quantization settings in order to improve accuracy + +The following table summarizes the post-training quantization experiments shown in the tutorial: + +| Precision | INT8: Mode | INT8: Per-channel | INT8: Clipping | FP16 Modules | Test ppl | +|-----------------|------------|-------------------|----------------|------------------------------------|----------| +| FP32 | N/A | N/A | N/A | N/A | 86.87 | +| Full FP16 | N/A | N/A | N/A | Entire Model | 86.80 | +| Full INT8 | Symmetric | No | No | None | 104.2 | +| Full INT8 | Asymmetric | Yes | No | None | 100.45 | +| Full INT8 | Asymmetric | Yes | Averaging | None | 88.85 | +| Mixed INT8/FP16 | Asymmetric | Yes | No | Encoder, decoder, Eltwise add/mult | 86.77 | +| Mixed INT8/FP16 | Asymmetric | Yes | Averaging | Encoder, decoder | 88.96 | + +For more details see the tutorial itself.