Skip to content
Snippets Groups Projects
  • Guy Jacob's avatar
    28a8ee18
    Post-train quant based on stats + additional modules quantized (#136) · 28a8ee18
    Guy Jacob authored
    Summary of changes:
    (1) Post-train quantization based on pre-collected statistics
    (2) Quantized concat, element-wise addition / multiplication and embeddings
    (3) Move post-train quantization command line args out of sample code
    (4) Configure post-train quantization from YAML for more fine-grained control
    
    (See PR #136 for more detailed changes descriptions)
    Post-train quant based on stats + additional modules quantized (#136)
    Guy Jacob authored
    Summary of changes:
    (1) Post-train quantization based on pre-collected statistics
    (2) Quantized concat, element-wise addition / multiplication and embeddings
    (3) Move post-train quantization command line args out of sample code
    (4) Configure post-train quantization from YAML for more fine-grained control
    
    (See PR #136 for more detailed changes descriptions)
schedule.md 23.04 KiB

Compression scheduler

In iterative pruning, we create some kind of pruning regimen that specifies how to prune, and what to prune at every stage of the pruning and training stages. This motivated the design of CompressionScheduler: it needed to be part of the training loop, and to be able to make and implement pruning, regularization and quantization decisions. We wanted to be able to change the particulars of the compression schedule, w/o touching the code, and settled on using YAML as a container for this specification. We found that when we make many experiments on the same code base, it is easier to maintain all of these experiments if we decouple the differences from the code-base. Therefore, we added to the scheduler support for learning-rate decay scheduling because, again, we wanted the freedom to change the LR-decay policy without changing code.

High level overview

Let's briefly discuss the main mechanisms and abstractions: A schedule specification is composed of a list of sections defining instances of Pruners, Regularizers, Quantizers, LR-scheduler and Policies.

  • Pruners, Regularizers and Quantizers are very similar: They implement either a Pruning/Regularization/Quantization algorithm, respectively.
  • An LR-scheduler specifies the LR-decay algorithm.

These define the what part of the schedule.

The Policies define the when part of the schedule: at which epoch to start applying the Pruner/Regularizer/Quantizer/LR-decay, the epoch to end, and how often to invoke the policy (frequency of application). A policy also defines the instance of Pruner/Regularizer/Quantizer/LR-decay it is managing.
The CompressionScheduler is configured from a YAML file or from a dictionary, but you can also manually create Policies, Pruners, Regularizers and Quantizers from code.

Syntax through example

We'll use alexnet.schedule_agp.yaml to explain some of the YAML syntax for configuring Sensitivity Pruning of Alexnet.

version: 1
pruners:
  my_pruner:
    class: 'SensitivityPruner'
    sensitivities:
      'features.module.0.weight': 0.25
      'features.module.3.weight': 0.35
      'features.module.6.weight': 0.40
      'features.module.8.weight': 0.45
      'features.module.10.weight': 0.55
      'classifier.1.weight': 0.875
      'classifier.4.weight': 0.875
      'classifier.6.weight': 0.625

lr_schedulers:
   pruning_lr:
     class: ExponentialLR
     gamma: 0.9

policies:
  - pruner:
      instance_name : 'my_pruner'
    starting_epoch: 0
    ending_epoch: 38
    frequency: 2

  - lr_scheduler:
      instance_name: pruning_lr
    starting_epoch: 24
    ending_epoch: 200
    frequency: 1

There is only one version of the YAML syntax, and the version number is not verified at the moment. However, to be future-proof it is probably better to let the YAML parser know that you are using version-1 syntax, in case there is ever a version 2.

version: 1

In the pruners section, we define the instances of pruners we want the scheduler to instantiate and use.
We define a single pruner instance, named my_pruner, of algorithm SensitivityPruner. We will refer to this instance in the Policies section.
Then we list the sensitivity multipliers, \(s\), of each of the weight tensors.
You may list as many Pruners as you want in this section, as long as each has a unique name. You can several types of pruners in one schedule.

pruners:
  my_pruner:
    class: 'SensitivityPruner'
    sensitivities:
      'features.module.0.weight': 0.25
      'features.module.3.weight': 0.35
      'features.module.6.weight': 0.40
      'features.module.8.weight': 0.45
      'features.module.10.weight': 0.55
      'classifier.1.weight': 0.875
      'classifier.4.weight': 0.875
      'classifier.6.weight': 0.6

Next, we want to specify the learning-rate decay scheduling in the lr_schedulers section. We assign a name to this instance: pruning_lr. As in the pruners section, you may use any name, as long as all LR-schedulers have a unique name. At the moment, only one instance of LR-scheduler is allowed. The LR-scheduler must be a subclass of PyTorch's _LRScheduler. You can use any of the schedulers defined in torch.optim.lr_scheduler (see here). In addition, we've implemented some additional schedulers in Distiller (see here). The keyword arguments (kwargs) are passed directly to the LR-scheduler's constructor, so that as new LR-schedulers are added to torch.optim.lr_scheduler, they can be used without changing the application code.

lr_schedulers:
   pruning_lr:
     class: ExponentialLR
     gamma: 0.9

Finally, we define the policies section which defines the actual scheduling. A Policy manages an instance of a Pruner, Regularizer, Quantizer, or LRScheduler, by naming the instance. In the example below, a PruningPolicy uses the pruner instance named my_pruner: it activates it at a frequency of 2 epochs (i.e. every other epoch), starting at epoch 0, and ending at epoch 38.

policies:
  - pruner:
      instance_name : 'my_pruner'
    starting_epoch: 0
    ending_epoch: 38
    frequency: 2

  - lr_scheduler:
      instance_name: pruning_lr
    starting_epoch: 24
    ending_epoch: 200
    frequency: 1

This is iterative pruning:

  1. Train Connectivity

  2. Prune Connections

  3. Retrain Weights

  4. Goto 2