examples/classifier_compression/compress_classifier.py · 78e98a51803e7119aa97eea52eca471255640bcd · llvm / distiller

6 years ago

Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51

Neta Zmora authored 6 years ago

When we resume from a checkpoint, we usually want to continue using the checkpoint’s
masks.  I say “usually” because I can see a situation where we want to prune a model
and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
(https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
```compress_classifier.py``` for the time being – so we ignore DSD.

There are two possible situations when we resume a checkpoint that has a serialized
```CompressionScheduler``` with pruning masks:
1. We are planning on using a new ```CompressionScheduler``` that is defined in a
schedule YAML file.  In this case, we want to copy the masks from the serialized
```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
constructing from the YAML file.  This is one fix.
2. We are resuming a checkpoint, but without using a YAML schedule file.
In this case we want to use the ```CompressionScheduler``` that we loaded from the
checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
as we train, so that we don’t lose them.  This is the second fix.

For DSD, we would need a new flag that would override using the ```CompressionScheduler```
that we load from the checkpoint.

Unverified

78e98a51

History

Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76)

Neta Zmora authored 6 years ago

When we resume from a checkpoint, we usually want to continue using the checkpoint’s
masks.  I say “usually” because I can see a situation where we want to prune a model
and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
(https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
```compress_classifier.py``` for the time being – so we ignore DSD.

There are two possible situations when we resume a checkpoint that has a serialized
```CompressionScheduler``` with pruning masks:
1. We are planning on using a new ```CompressionScheduler``` that is defined in a
schedule YAML file.  In this case, we want to copy the masks from the serialized
```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
constructing from the YAML file.  This is one fix.
2. We are resuming a checkpoint, but without using a YAML schedule file.
In this case we want to use the ```CompressionScheduler``` that we loaded from the
checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
as we train, so that we don’t lose them.  This is the second fix.

For DSD, we would need a new flag that would override using the ```CompressionScheduler```
that we load from the checkpoint.