Skip to content
Snippets Groups Projects
  • Neta Zmora's avatar
    78e98a51
    Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76) · 78e98a51
    Neta Zmora authored
    When we resume from a checkpoint, we usually want to continue using the checkpoint’s
    masks.  I say “usually” because I can see a situation where we want to prune a model
    and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
    masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
    (https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
    ```compress_classifier.py``` for the time being – so we ignore DSD.
    
    There are two possible situations when we resume a checkpoint that has a serialized
    ```CompressionScheduler``` with pruning masks:
    1. We are planning on using a new ```CompressionScheduler``` that is defined in a
    schedule YAML file.  In this case, we want to copy the masks from the serialized
    ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
    constructing from the YAML file.  This is one fix.
    2. We are resuming a checkpoint, but without using a YAML schedule file.
    In this case we want to use the ```CompressionScheduler``` that we loaded from the
    checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
    as we train, so that we don’t lose them.  This is the second fix.
    
    For DSD, we would need a new flag that would override using the ```CompressionScheduler```
    that we load from the checkpoint.
    Bug fix: Resuming from checkpoint ignored the masks stored in the checkpoint (#76)
    Neta Zmora authored
    When we resume from a checkpoint, we usually want to continue using the checkpoint’s
    masks.  I say “usually” because I can see a situation where we want to prune a model
    and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the
    masks.  This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training
    (https://arxiv.org/abs/1607.04381).  But I didn’t want to add another argument to
    ```compress_classifier.py``` for the time being – so we ignore DSD.
    
    There are two possible situations when we resume a checkpoint that has a serialized
    ```CompressionScheduler``` with pruning masks:
    1. We are planning on using a new ```CompressionScheduler``` that is defined in a
    schedule YAML file.  In this case, we want to copy the masks from the serialized
    ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are
    constructing from the YAML file.  This is one fix.
    2. We are resuming a checkpoint, but without using a YAML schedule file.
    In this case we want to use the ```CompressionScheduler``` that we loaded from the
    checkpoint file.  All this ```CompressionScheduler``` does is keep applying the masks
    as we train, so that we don’t lose them.  This is the second fix.
    
    For DSD, we would need a new flag that would override using the ```CompressionScheduler```
    that we load from the checkpoint.