- Jun 22, 2018
-
-
Thomas Fan authored
Reviewed and looking good. We have to set a convention for naming files.
-
- Jun 21, 2018
-
-
Guy Jacob authored
-
Guy Jacob authored
-
Guy Jacob authored
-
Guy Jacob authored
-
Guy Jacob authored
-
Guy Jacob authored
-
Neta Zmora authored
Fixed a bug in module name normalization, for modules with a name ending in ".module" (e.g. "features.module" in the case of VGG). Made the tests more robust, and also refactored the common code to distiller/utils.py
-
- Jun 19, 2018
-
-
Guy Jacob authored
* Modify 'create_png' to use the correct data structures (dicts instead lists, etc.) * Handle case where an op was called not from a module. This relates to: * ONNX->"User-Friendly" name conversion to account for cases where * Detection of existing op with same name In both cases use the ONNX op type in addition to the op name * Return an "empty" shape instead of None when ONNX couldn't infer a parameter's shape * Expose option of PNG summary with parameters to user
-
- Jun 15, 2018
-
-
Neta Zmora authored
-
- Jun 14, 2018
-
-
Neta Zmora authored
When removing channels and thinning, the number of filters of the next layer was not set correctly. When loading a model that has already been thinned (e.g when loading a model, thinning, saving, loading), don’t crash on wrong tensor sizes. Cache the thinning recipe in the model when loading from checkpoint. Without this, a loaded thin model will lose its recipes when saved to checkpoint
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
This reverts commit ecade1b2. This simply does not work, so reverting until we find a correct solution. For example, in the language model the encoder and decoder weights are tied and use the same memory, and yet I can't see how to determine that they are the same parameter.
-
- Jun 13, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
Replace the original "homebrew" optimizer and LR-decay schedule with PyTorch's SGD and ReduceLROnPlateau. SGD with momentum=0 and weight_decay=0, and ReduceLROnPlateau with patience=0 and factor=0.5 will give the same behavior as in the original PyTorch example. Having a standard optimizer and LR-decay schedule gives us the flexibility to experiment with these during the training process.
-
Neta Zmora authored
-
Neta Zmora authored
In language models, we might use use "weight tying", which means that the same weights tensor is used in several different places. If tying is used, we'd like to log the tensor information, but exclude it from the total sparsity calculation.
-
- Jun 10, 2018
-
-
Neta Zmora authored
* Large update containing new thinning algorithm. Thinning a model is the process of taking a dense network architecture with a parameter model that has structure-sparsity (filters or channels) in the weights tensors of convolution layers, and making changes in the network architecture and parameters, in order to completely remove the structures. The new architecture is smaller (condensed), with less channels and filters in some of the convolution layers. Linear and BatchNormalization layers are also adjusted as required. To perform thinning, we create a SummaryGraph (‘sgraph’) of our model. We use the ‘sgraph’ to infer the data-dependency between the modules in the PyTorch network. This entire process is not trivial and will be documented in a different place. Large refactoring of SummaryGraph to support the new thinning requirement for traversing successors and predecessors. - Operations (ops) are now stored in a dictionary, so that they can be accessed quickly by name. - Refactor Operation construction code - Added support for search a node’s predecessors and successors. You can search for all predecessors/successors by depth, or by type. - create_png now supports an option to display the parameter nodes Updated schedules with new thinning syntax. * Thinning: support iterative thinning of models THere's a caveat with this commit: when using this code you will need to train with SGD momentum=0. The momentum update is dependent on the weights, and because we dynamically change the weights shapes, we need to either make the apporpriate changes in the Optimizer, or disable the momentum. For now, we disable the momentum * Thinning: move the application of FilterRemover to on_minibatch_begin * Thinning: fix syntax error * Word-level language model compression Added an implementation of Baidu’s RNN pruning scheme: Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017). Exploring Sparsity in Recurrent Neural Networks. (https://arxiv.org/abs/1704.05119) Added an example of word-level language model compression. The language model is based on PyTorch’s example: https://github.com/pytorch/examples/tree/master/word_language_model Added an AGP pruning schedule and RNN pruning schedule to demonstrate compression of the language model. * thinning: remove dead code * remove resnet18 filter pruning since the scheduler script is incomplete * thinning: fix indentation error * thinning: remove dead code * thinning: updated resnet20-CIFAR filter-removsal reference checkpoints * thinning: updated resnet20-CIFAR filter-removal reference schedules These are for use with the new thinning scheudle algorithm
-
- Jun 07, 2018
-
-
Neta Zmora authored
Added an implementation of Baidu’s RNN pruning scheme: Narang, Sharan & Diamos, Gregory & Sengupta, Shubho & Elsen, Erich. (2017). Exploring Sparsity in Recurrent Neural Networks. (https://arxiv.org/abs/1704.05119) Added an example of word-level language model compression. The language model is based on PyTorch’s example: https://github.com/pytorch/examples/tree/master/word_language_model Added an AGP pruning schedule and RNN pruning schedule to demonstrate compression of the language model.
-
- May 29, 2018
-
-
Neta Zmora authored
This is a temporary implementation that allows filter-removal and netowrk thinning for VGG. The implementation continues the present design for network thinning, which is problematic because parts of the solution are specific to each model. Leveraging some new features in PyTorch 0.4, we are now able to provide a more generic solution to thinning, which we will push to 'master' soon. This commit bridges the feature gap, for VGG filter-removal, for the meantime.
-
Neta Zmora authored
This is a temporary implementation that allows filter-removal and netowrk thinning for VGG. The implementation continues the present design for network thinning, which is problematic because parts of the solution are specific to each model. Leveraging some new features in PyTorch 0.4, we are now able to provide a more generic solution to thinning, which we will push to 'master' soon. This commit bridges the feature gap, for VGG filter-removal, for the meantime.
-
- May 22, 2018
-
-
Neta Zmora authored
Two places in the documentation gave the wrong path to the example Alexnet sensitivity pruning schedule.
-
- May 17, 2018
-
-
Neta Zmora authored
The latest changes to the logger caused the CI tests to fail, because test assumes that the logging.conf file is present in the same directory as the sample application script. The sample application used cwd() instead, and did not find the log configuration file.
-
- May 16, 2018
-
-
Neta Zmora authored
Soon we will be reusing this function in other sample apps, so let's move it to app_utils.
-
Neta Zmora authored
The 'master' branch now uses PyTorch 0.4, which has API changes that are not backward compatible with PyTorch 0.3. After we've upgraded Distiller's internal implementation to be compatible with PyTorch 0.4, we've added a check that you are using the correct PyTorch version. Note that we only perform this check in the sample image classifier compression application.
-
Neta Zmora authored
Work on the 'master' branch uses pre-release version numbers. After releasing v0.1.0 with PyTorch 0.3, we have upgraded 'master' to support PyTorch 0.4 which contains API changes which are not backward compatible.
-
Guy Jacob authored
-
Neta Zmora authored
Eventually we will want to use this code in other sample applications, so let's move the logger configuration code to a separate function. There's a bit of ugly hacking in this current implementation because I've added variable members to logging.logger. These are actaully config-once variables that convey the logging directory and filename. I did not want to add more names to the global namespace, so I hacked a temporary solution in which logging.logger is acts as a conveyor and private namespace. We'll get that cleaned up as we do more refactoring.
-
Neta Zmora authored
PyTorch 0.4 now fully supports the ONNX export features that are needed in order to create a SummaryGraph, which is sort of a "shadow graph" for PyTorch models. The big advantage of SummaryGraph is that it gives us information about the connectivity of nodes. With connectivity information we can compute per-node MAC (compute) and BW, and better yet, we can remove channels, filters, and layers (more on this in future commits). In this commit we (1) replace the long and overly-verbose ONNX node names, with PyTorch names; and (2) move MAC and BW attributes from the Jupyter notebook to the SummaryGraph.
-
Neta Zmora authored
This is a niche feature, which lets you print the names of the modules in a model, from the command-line. Non-leaf nodes are excluded from this list. Other caveats are documented in the code.
-
Neta Zmora authored
Data parallel models may execute faster on multiple GPUs, but rendering them creates visually complex and illegible graphs. Therefore, when creating models for a PNG summary, we opt to use non-parallel models.
-
Neta Zmora authored
When we are traversing the forward path of a graph, by invoking each module's forward_hook callback, we sometimes want to know the full name of the module. Previously, to infer the module name, we looked up the name of self.weight parameter and used that to get the module name. In PyTorch 0.4 we can directly look up the module name using model_find_module_name.
-
Neta Zmora authored
Various small changes due to the chamnges in the semantics and syntax of the PyTorch 0.4 API. Note that currently distiller.model_performance_summary() returns wrong results on graphs containing torch.nn.DataParallel layers.
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
Following https://pytorch.org/2018/04/22/0_4_0-migration-guide.html, we need to be more precise in how we use .type()
-