- Jun 30, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
You no longer need to use —momentum=0 when removing structures dynamically. The SGD momentum update (velocity) is dependent on the weights, which PyTorch optimizers cache internally. This caching is not a problem for filter/channel removal (thinning) because although we dynamically change the shapes of the weights tensors, we don’t change the weights tensors themselves. PyTorch’s SGD creates tensors to store the momentum updates, and these tensors have the same shape as the weights tensors. When we change the weights tensors, we need to make the appropriate changes in the Optimizer, or disable the momentum. We added a new function - thinning.optimizer_thinning() - to do this. This function is brittle as it is tested only on optim.SGD and relies on the internal representation of the SGD optimizer, which can change w/o notice. For example, optim.Adam uses state['exp_avg'], state['exp_avg_sq'] Which also depend the shape of the weight tensors. We needed to pass the Optimizer instance to Thinning policies (ChannelRemover, FilterRemover) via the callbacks, which required us to change the callback interface. In the future we plan a bigger change to the callback API, to allow passing of arbitrary context from the training environment to Distiller. Also in this commit: * compress_classifier.py had special handling for resnet layer-removal, which is used in examples/ssl/ssl_4D-removal_training.yaml. This is a brittle and ugly hack. Until we have a more elegant solution, I’m Removing support for layer-removal. * Added to the tests invocation of forward and backward passes over a model. This tests more of the real flows, which use the optimizer and construct gradient tensors. * Added a test of a special case of convolution filter-pruning which occurs when the next layer is fully-connected (linear)
-
- Jun 21, 2018
-
-
Guy Jacob authored
-
- Jun 19, 2018
-
-
Guy Jacob authored
* Modify 'create_png' to use the correct data structures (dicts instead lists, etc.) * Handle case where an op was called not from a module. This relates to: * ONNX->"User-Friendly" name conversion to account for cases where * Detection of existing op with same name In both cases use the ONNX op type in addition to the op name * Return an "empty" shape instead of None when ONNX couldn't infer a parameter's shape * Expose option of PNG summary with parameters to user
-
- May 17, 2018
-
-
Neta Zmora authored
The latest changes to the logger caused the CI tests to fail, because test assumes that the logging.conf file is present in the same directory as the sample application script. The sample application used cwd() instead, and did not find the log configuration file.
-
- May 16, 2018
-
-
Neta Zmora authored
Soon we will be reusing this function in other sample apps, so let's move it to app_utils.
-
Neta Zmora authored
The 'master' branch now uses PyTorch 0.4, which has API changes that are not backward compatible with PyTorch 0.3. After we've upgraded Distiller's internal implementation to be compatible with PyTorch 0.4, we've added a check that you are using the correct PyTorch version. Note that we only perform this check in the sample image classifier compression application.
-
Neta Zmora authored
Eventually we will want to use this code in other sample applications, so let's move the logger configuration code to a separate function. There's a bit of ugly hacking in this current implementation because I've added variable members to logging.logger. These are actaully config-once variables that convey the logging directory and filename. I did not want to add more names to the global namespace, so I hacked a temporary solution in which logging.logger is acts as a conveyor and private namespace. We'll get that cleaned up as we do more refactoring.
-
Neta Zmora authored
This is a niche feature, which lets you print the names of the modules in a model, from the command-line. Non-leaf nodes are excluded from this list. Other caveats are documented in the code.
-
Neta Zmora authored
Data parallel models may execute faster on multiple GPUs, but rendering them creates visually complex and illegible graphs. Therefore, when creating models for a PNG summary, we opt to use non-parallel models.
-
Neta Zmora authored
-
- May 14, 2018
-
-
Guy Jacob authored
-
- Apr 24, 2018
-
-
Neta Zmora authored
-