Automated Gradual Pruner (AGP) Pruning Examples
Introduction
In To prune, or not to prune: exploring the efficacy of pruning for model compression, authors Michael Zhu and Suyog Gupta provide an algorithm to schedule iterative level pruning.
We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value (usually 0) to a final sparsity value over a span of n pruning steps. The intuition behind this sparsity function in equation (1) is to prune the network rapidly in the initial phase when the redundant connections are abundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.
The authors describe AGP:
- Our automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity.
- Doesn't require much hyper-parameter tuning
- Shown to perform well across different models
- Does not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.
Distiller
- The original AGP paper described the application of AGP for fine-grained pruning, and in Distiller we also implemented AGP for structured-pruning.
- We also provide examples of applying AGP for pruning language models. The results and methodology are discussed at length in the documentation
Examples
The tables below provide the results of the experimental pruning schedules that appear in this directory. Each example YAML schedule-file contains the command-line used to execute the experiment, and further details.
Element-wise sparsity
Model | Granularity | Sparsity (%) | Top1 | Baseline Top1 |
---|---|---|---|---|
AlexNet | Fine | 88.3 | 56.528 | 56.55 |
MobileNet v1 (width=1) | Fine | 51.6 | 68.8 | 68.9 |
ResNeXt-101-32x4d | Fine | 75.0 | 78.66 | 78.19 |
ResNet-18 | Fine | 59.9 | 69.87 | 69.76 |
ResNet-50 | Fine | 26 .0 | 76.54 | 76.15 |
ResNet-50 | Fine | 80.0 | 75.99 | 76.15 |
ResNet-50 | Fine | 84.6 | 75.66 | 76.15 |
Block sparsity
Model | Granularity | Sparsity | Top1 | Baseline Top1 |
---|---|---|---|---|
ResNet-50 | 1x1x8 | 36.7 | 76.36 | 76.15 |
Filter pruning with thinning
Our objective here is to minimize compute but performing thinning. Therefore, sparsity is often at 0%, but the number of parameters is reduced as filters are removed.
In this table we seek to see a lower value for Parameters Kept (%)
and, more importantely,
Compute Kept (%)
.
Model | Granularity | Sparsity (%) | Parameters Kept (%) | Compute Kept (%) | Top1 | Baseline Top1 |
---|---|---|---|---|---|---|
ResNet-50 | Filters | 0.0 | 43.37 | 44.56 | 74.47 | 76.15 |
ResNet-50 (2) | Filters | 0.0 | 49.69 | 49.82 | 74.78 | 76.15 |
ResNet-50 (3) | Filters | 0.0 | 67.95 | 67.33 | 75.75 | 76.15 |
ResNet-50 (w/ FC) | Filters | 11.6 | 42.74 | 44.56 | 74.56 | 76.15 |