Skip to content
Snippets Groups Projects
  • Neta Zmora's avatar
    78e2e4c7
    Lottery Ticket Hypothesis · 78e2e4c7
    Neta Zmora authored
    Added support for saving the randomly initialized network before
    starting training; and added an implmentation showing how to extract
    a (winning) lottery ticket from the prestine network, and the
    pruned network.
    78e2e4c7
    History
    Lottery Ticket Hypothesis
    Neta Zmora authored
    Added support for saving the randomly initialized network before
    starting training; and added an implmentation showing how to extract
    a (winning) lottery ticket from the prestine network, and the
    pruned network.
README.md 28.82 KiB

License DOI

Distiller is an open-source Python package for neural network compression research.

Network compression can reduce the memory footprint of a neural network, increase its inference speed and save energy. Distiller provides a PyTorch environment for prototyping and analyzing compression algorithms, such as sparsity-inducing methods and low-precision arithmetic.

Note on Release 0.3 - Possible BREAKING Changes

As of release 0.3, we've moved some code around to enable proper packaging and installation of Distiller. In addition, we updated Distiller to support PyTorch 1.X, which might also cause older code to break due to some API changes.
If updating from an earlier revision of the code, please make sure to follow the instructions in the install section to make sure proper installation of Distiller and all dependencies.

What's New in November?

  • Quantization: Several new features in range-based linear quantization:
    • Asymmetric post-training quantization (only symmetric supported so until now)
    • Quantization aware training for range-based (min-max) symmetric and asymmetric quantization
    • Per-channel weights quantization support (per output channel) in both training and post-training
    • To improve quantization results: Averaging-based activations clipping in post-training quantization.
    • More control over post-training quantization configuration: Additional command line arguments in image classification sample.
  • Added an implementation of Dynamic Network Surgery for Efficient DNNs with:
    • A sample implementation on ResNet50 which achieves 82.5% compression 75.5% Top1 (-0.6% from TorchVision baseline).
    • A new SplicingPruner pruning algorithm.
    • New features for PruningPolicy:
    1. The pruning policy can use two copies of the weights: one is used during the forward-pass, the other during the backward pass. You can control when the mask is frozen and always applied.
    2. Scheduling at the training-iterationgranularity (i.e. at the mini-batch granularity). Until now we could schedule pruning at the epoch-granularity.
  • A bunch of new schedules showing AGP in action; including hybrid schedules combining structured-pruning and element-wise pruning.
  • Filter and channel pruning
    • Fixed problems arising in non-trivial data dependencies.
    • Added documentation
    • Changed the YAML API to express complex dependencies when pruning channels and filters.
  • Fixed a bunch of bugs
  • Image classifier compression sample:
    • Added a new command-line argument to report the top N best accuracy scores, instead of just the highest score.
    • Added an option to load a model in serialized mode.
  • We've fixed a couple of Early Exit bugs, and improved the documentation
  • We presented Distiller at AIDC 2018 Beijing and @haim-barad presented his Early Exit research implemented using Distiller.
  • We've looked up our star-gazers (that might be you ;-) and where they are located:
    The map was generated by this utility.
What's New in October?

We've added two new Jupyter notebooks:

  • The first notebook contrasts what sparse and dense versions of ResNet50 "look at".
  • The second notebook shows a simple application of Truncated SVD to the linear layer in ResNet50.

We've added collection of activation statistics!

Activation statistics can be leveraged to make pruning and quantization decisions, and so we added support to collect these data. Two types of activation statistics are supported: summary statistics, and detailed records per activation. Currently we support the following summaries:

  • Average activation sparsity, per layer
  • Average L1-norm for each activation channel, per layer
  • Average sparsity for each activation channel, per layer

For the detailed records we collect some statistics per activation and store it in a record.
Using this collection method generates more detailed data, but consumes more time, so Beware.

  • You can collect activation data for the different training phases: training/validation/test.
  • You can access the data directly from each module that you chose to collect stats for.
  • You can also create an Excel workbook with the stats.

Table of Contents

Highlighted features

  • Automatic Compression
  • Weight pruning
    • Element-wise pruning using magnitude thresholding, sensitivity thresholding, target sparsity level, and activation statistics
  • Structured pruning
    • Convolution: 2D (kernel-wise), 3D (filter-wise), 4D (layer-wise), and channel-wise structured pruning.
    • Fully-connected: column-wise and row-wise structured pruning.
    • Structure groups (e.g. structures of 4 filters).
    • Structure-ranking with using weights or activations criteria (Lp-norm, APoZ, gradients, random, etc.).
    • Support for new structures (e.g. block pruning)
  • Control
    • Soft (mask on forward-pass only) and hard pruning (permanently disconnect neurons)
    • Dual weight copies (compute loss on masked weights, but update unmasked weights)
    • Model thinning (AKA "network garbage removal") to permanently remove pruned neurons and connections.
  • Schedule
    • Flexible scheduling of pruning, regularization, and learning rate decay (compression scheduling)
    • One-shot and iterative pruning (and fine-tuning) are supported.
    • Easily control what is performed each training step (e.g. greedy layer by layer pruning to full model pruning).
    • Automatic gradual schedule (AGP) for pruning individual connections and complete structures.
    • The compression schedule is expressed in a YAML file so that a single file captures the details of experiments. This dependency injection design decouples the Distiller scheduler and library from future extensions of algorithms.
  • Element-wise and filter-wise pruning sensitivity analysis (using L1-norm thresholding). Examine the data from some of the networks we analyzed, using this notebook.
  • Regularization
    • L1-norm element-wise regularization
    • Group Lasso an group variance regularization
  • Quantization
    • Automatic mechanism to transform existing models to quantized versions, with customizable bit-width configuration for different layers. No need to re-write the model for different quantization methods.
    • Post-training quantization of trained full-precision models, dynamic and static (statistics-based)
    • Support for quantization-aware training in the loop
  • Knowledge distillation
    • Training with knowledge distillation, in conjunction with the other available pruning / regularization / quantization methods.
  • Conditional computation
    • Sample implementation of Early Exit
  • Low rank decomposition
  • Lottery Ticket Hypothesis training
  • Export statistics summaries using Pandas dataframes, which makes it easy to slice, query, display and graph the data.
  • A set of Jupyter notebooks to plan experiments and analyze compression results. The graphs and visualizations you see on this page originate from the included Jupyter notebooks.
    • Take a look at this notebook, which compares visual aspects of dense and sparse Alexnet models.
    • This notebook creates performance indicator graphs from model data.
  • Sample implementations of published research papers, using library-provided building blocks. See the research papers discussions in our model-zoo.
  • Logging to the console, text file and TensorBoard-formatted file.
  • Export to ONNX (export of quantized models pending ONNX standardization)

Installation

These instructions will help get Distiller up and running on your local machine.

  1. Clone Distiller
  2. Create a Python virtual environment
  3. Install the package

Notes:

  • Distiller has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.
  • If you are not using a GPU, you might need to make small adjustments to the code.

Clone Distiller

Clone the Distiller code repository from github: