# Automated Gradual Pruning Schedule

Michael Zhu and Suyog Gupta, ["To prune, or not to prune: exploring the efficacy of pruning for model compression"](https://arxiv.org/pdf/1710.01878), 2017 NIPS Workshop on Machine Learning of Phones and other Consumer Devices<br>
<br>
After completing sensitivity analysis, decide on your pruning schedule.

## Table of Contents
1. [Implementation of the gradual sparsity function](#Implementation-of-the-gradual-sparsity-function)
2. [Visualize pruning schedule](#Visualize-pruning-schedule)
3. [References](#References)

In [None]:
import numpy
import matplotlib.pyplot as plt
from functools import partial
import torch
from torch.autograd import Variable
from ipywidgets import widgets, interact

## Implementation of the gradual sparsity function

The function ```sparsity_target``` implements the gradual sparsity schedule from [[1]](#zhu-gupta):<br><br>
<b><i>"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value $s_i$ (usually 0) to a final sparsity value $s_f$ over a span of $n$ pruning steps, starting at training step $t_0$ and with pruning frequency $\Delta t$."</i></b><br>
<br>

<div id="eq:zhu_gupta_schedule"></div>
<center>
$\large
\begin{align}
s_t = s_f + (s_i - s_f) \left(1- \frac{t-t_0}{n\Delta t}\right)^3
\end{align}
\ \ for
\large \ \ t \in \{t_0, t_0+\Delta t, ..., t_0+n\Delta t\}
$
</center>
<br>
Pruning happens once at the beginning of each epoch, until the duration of the pruning (the number of epochs to prune) is exceeded.  After pruning ends, the training continues without pruning, but the pruned weights are kept at zero.

In [None]:
def sparsity_target(starting_epoch, ending_epoch, initial_sparsity, final_sparsity, current_epoch):
    if final_sparsity < initial_sparsity:
        return current_epoch 
    if current_epoch < starting_epoch:
        return current_epoch
    
    span = ending_epoch - starting_epoch
    target_sparsity = ( final_sparsity +
                        (initial_sparsity - final_sparsity) *
                        (1.0 - ((current_epoch-starting_epoch)/span))**3)
    return target_sparsity

## Visualize pruning schedule
When using the Automated Gradual Pruning (AGP) schedule, you may want to visualize how the pruning schedule will look as a function of the epoch number.  This is called the *sparsity function*.  The widget below will help you do this.<br>
There are three knobs you can use to change the schedule:
- ```duration```: this is the number of epochs over which to use the AGP schedule ($n\Delta t$).
- ```initial_sparsity```: $s_i$
- ```final_sparsity```: $s_f$
- ```frequency```: this is the pruning frequency ($\Delta t$).

In [None]:
def draw_pruning(duration, initial_sparsity, final_sparsity, frequency):
    epochs = []
    sparsity_levels = []
    # The derivative of the sparsity (i.e. sparsity rate of change)
    d_sparsity = []

    if frequency=='':
        frequency = 1 
    else:
        frequency = int(frequency)
    for epoch in range(0,40):
        epochs.append(epoch)
        current_epoch=Variable(torch.FloatTensor([epoch]), requires_grad=True)
        if epoch<duration and epoch%frequency == 0:
            sparsity = sparsity_target(
                     starting_epoch=0, 
                     ending_epoch=duration, 
                     initial_sparsity=initial_sparsity, 
                     final_sparsity=final_sparsity,
                current_epoch=current_epoch
            )
            
            sparsity_levels.append(sparsity)
            sparsity.backward()
            d_sparsity.append(current_epoch.grad.item())
            current_epoch.grad.data.zero_()
        else:
            sparsity_levels.append(sparsity)
            d_sparsity.append(0)
            

    plt.plot(epochs, sparsity_levels, epochs, d_sparsity)
    plt.ylabel('sparsity (%)')
    plt.xlabel('epoch')
    plt.title('Pruning Rate')
    plt.ylim(0, 100)
    plt.draw()


duration_widget = widgets.IntSlider(min=0, max=100, step=1, value=28)
si_widget = widgets.IntSlider(min=0, max=100, step=1, value=0)
interact(draw_pruning, 
         duration=duration_widget, 
         initial_sparsity=si_widget, 
         final_sparsity=(0,100,1),
         frequency='2');

<div id="toc"></div>
## References
1. <div id="zhu-gupta"></div> **Michael Zhu and Suyog Gupta**. 
    [*To prune, or not to prune: exploring the efficacy of pruning for model compression*](https://arxiv.org/pdf/1710.01878),
    NIPS Workshop on Machine Learning of Phones and other Consumer Devices,
    2017.