Skip to content
Snippets Groups Projects
user avatar
Neta Zmora authored
70e26735
History

Automated Gradual Pruner (AGP) Pruning Examples

Introduction

In To prune, or not to prune: exploring the efficacy of pruning for model compression, authors Michael Zhu and Suyog Gupta provide an algorithm to schedule iterative level pruning.

We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value (usually 0) to a final sparsity value over a span of n pruning steps. The intuition behind this sparsity function in equation (1) is to prune the network rapidly in the initial phase when the redundant connections are abundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network.

The authors describe AGP:

  • Our automated gradual pruning algorithm prunes the smallest magnitude weights to achieve a preset level of network sparsity.
  • Doesn't require much hyper-parameter tuning
  • Shown to perform well across different models
  • Does not make any assumptions about the structure of the network or its constituent layers, and is therefore more generally applicable.

Distiller

  • The original AGP paper described the application of AGP for fine-grained pruning, and in Distiller we also implemented AGP for structured-pruning.
  • We also provide examples of applying AGP for pruning language models. The results and methodology are discussed at length in the documentation

Examples

The tables below provide the results of the experimental pruning schedules that appear in this directory. Each example YAML schedule-file contains the command-line used to execute the experiment, and further details.

Element-wise sparsity

Model Granularity Sparsity (%) Top1 Baseline Top1
AlexNet Fine 88.3 56.528 56.55
MobileNet v1 (width=1) Fine 51.6 68.8 68.9
ResNeXt-101-32x4d Fine 75.0 78.66 78.19
ResNet-18 Fine 59.9 69.87 69.76
ResNet-50 Fine 26 .0 76.54 76.15
ResNet-50 Fine 80.0 75.99 76.15
ResNet-50 Fine 84.6 75.66 76.15

Block sparsity

Model Granularity Sparsity Top1 Baseline Top1
ResNet-50 1x1x8 36.7 76.36 76.15

Filter pruning with thinning

Our objective here is to minimize compute but performing thinning. Therefore, sparsity is often at 0%, but the number of parameters is reduced as filters are removed.

In this table we seek to see a lower value for Parameters Kept (%) and, more importantely, Compute Kept (%).

Model Granularity Sparsity (%) Parameters Kept (%) Compute Kept (%) Top1 Baseline Top1
ResNet-50 Filters 0.0 43.37 44.56 74.47 76.15
ResNet-50 (2) Filters 0.0 49.69 49.82 74.78 76.15
ResNet-50 (3) Filters 0.0 67.95 67.33 75.75 76.15
ResNet-50 (w/ FC) Filters 11.6 42.74 44.56 74.56 76.15