- May 19, 2019
-
-
Guy Jacob authored
* Added scale factor approximation in post-training quantization using integer multiply + shift. # of bits for integer multiplier is user configurable * Updated documentation * Updated post-train quant command line examples readme file
-
Neta Zmora authored
1. Add basic annotations to Conv layers when generating model PNG diagrams 2. Refactor: replace dataset_dummy_input with the global utility distiller.get_dummy_input
-
- May 16, 2019
-
-
Neta Zmora authored
Remove the multiple instances of code that generates dummy input per dataset.
-
Neta Zmora authored
A wrong model was used
-
Neta Zmora authored
The previous PR merge introduced a couple of small errors when using the --summary flag.
-
Bar authored
Introduced a new utility function to export image-classifiers to ONNX: export_img_classifier_to_onnx. The functionality is not new, just refactored. In the sample application compress_classifier.py added --export-onnx as a stand-alone cmd-line flag for specifically exporting ONNX models. This new flag can take an optional argument which is used to name the exported onnx model file. The option to export models was removed from the –summary argument. Now we allow multiple --summary options be called together. Added a basic test for exporting ONNX.
-
Guy Jacob authored
-
- May 15, 2019
-
-
Neta Zmora authored
The weights_vol attribute reflects the size (volume) of an SG node’s weights tensor. The calculation of the weights volume was wrong. This does not have any significant impact because this attribute is not used.
-
Neta Zmora authored
This reverts commit a3f2ce2d.
-
Lev Zlotnik authored
* Traverses recursively through entire model and replaces all submodules of type `nn.LSTM` and `nn.LSTMCell` with distiller versions
-
Neta Zmora authored
The weights_vol attribute reflects the size (volume) of an SG node’s weights tensor. The calculation of the weights volume was wrong. This does not have any significant impact because this attribute is not used. wq
-
Guy Jacob authored
Added a collector for activation histograms (sub-class of ActivationStatsCollector). It is stats-based, meaning it requires pre-computed min/max stats per tensor. This is done in order to prevent the need to save all of the activation tensors throughout the run. The stats are expected in the format generated by QuantCalibrationStatsCollector. Details: * Implemented ActivationHistogramsCollector * Added Jupyter notebook showcasing activation histograms * Implemented helper function that performs the stats collection pass and histograms pass in one go * Also added separate helper function just for quantization stats collection * Integrated in image classification sample * data_loaders.py: Added option to have a fixed subset throughout within the same session. Using it to keep the same subset between the stats collection and histograms collection phases. * Other changes: * Calling assign_layer_fq_names in base-class of collectors. We do this since the collectors, as implemented so far, assume this is done. So makes sense to just do it in the base class instead of expecting the user to do it. * Enforcing a non-parallel model for quantization stats and histograms collectors * Jupyter notebooks - add utility function to enable loggers in notebooks. This allows us to see any logging done by Distiller APIs called from notebooks.
-
Guy Jacob authored
* Set _force_outplace when calling get_trace_graph. This is a workaround for losing scope information for certain in-place operations * Switch all dicts to OrderedDicts
-
- May 14, 2019
- May 06, 2019
-
-
Bar authored
In a former commit, distiller accepts checkpoints that do not contain 'optimizer' argument. However, this change was not reflected in the relevant test.
- May 05, 2019
-
-
Neta Zmora authored
This is in contrast to weights-filters removal
-
Neta Zmora authored
Support loading a model from a checkpoint file that does not have an Optimizer instance. Before the change, loading such a model required using ```load_lean_checkpoint``` (or --exp-load-weights-from From the compress_classifier.py command-line), therefore this change is for convenience only.
-
- May 02, 2019
-
-
Lev Zlotnik authored
-
- May 01, 2019
-
-
Neta Zmora authored
Added a link to the FAQ wiki page.
-
Lev Zlotnik authored
-
- Apr 30, 2019
-
-
Lev Zlotnik authored
* Update test_lstm_impl.py * Added PackedSequence functionality * Refactored forward implementation
-
- Apr 18, 2019
-
-
Bar authored
Also: * Single worker limitation not needed anymore, been fixed in PyTorch since v0.4.0 (https://github.com/pytorch/pytorch/pull/4640) * compress_classifier.py: If run in evaluation mode (--eval), enable deterministic mode. * Call utils.set_deterministic at data loaders creation if deterministic argument is set (don't assume user calls it outside) * Disable CUDNN benchmark mode in utils.set_deterministic (https://pytorch.org/docs/stable/notes/randomness.html#cudnn)
-
- Apr 16, 2019
-
-
Lev Zlotnik authored
* Introduce a modular, Python-level implementation of LSTM/LSTMCell using existing PyTorch nn.Modules as building blocks * This allows quantization of weights and internal activations of LSTM layers using the existing Quantizer. (In the PyTorch implementation of RNN/LSTM only the weights are exposed at the Python level, whereas the internal activations are "hidden" in C++ code.) * Supports stacked (multi-layer) and bi-directional LSTM * Implemented conversion functions from PyTorch LSTM module to our LSTM module and vice-versa * Tests for modular implementation correctness and for conversions * Jupyter notebook showing post-training quantization of a language model
-
Lev Zlotnik authored
-
- Apr 14, 2019
-
-
Guy Jacob authored
* Some refactoring to enable multiple clipping methods * BREAKING: clip_acts as a boolean flag (either in command line or in function signature) will fail. Error message with valid values from is displayed. * Implemented clipping activations at mean + N * std (N is user configurable) * Additional tests * Updated docs
-
Guy Jacob authored
-
- Apr 11, 2019
-
-
Guy Jacob authored
* Replace the optional 'best_top1' parameter with a generic optional dict which the caller can populate as needed. * Saved in the checkpoint under the key 'extras'
-
Guy Jacob authored
* In all logger types (PythonLogger, TensorBoardLogger, CSVLogger) * Exact behavior varies per logger type and documented in the code. * To enable in CSVLogger, changed its API to take a file name prefix (optionally empty) instead of the full name, and use a hard-coded name for logging weights sparsity. * Also fixed signature of log_training_progress in base DataLogger class to match the signature used in the sub-classes.
-
- Apr 09, 2019
-
-
Neta Zmora authored
Also added tests
-
Bar authored
This commit simplifies the SummaryGraph API, by removing from the client to burden to handle the differences between models with/without DataParallel layers. DataParallel layers in PyTorch change the fully-qualified names (FQNs) of PyTorch modules. A module's FQN unambiguously identifies a module within a model, by encoding the path to the module from the root of the model. For example, ```module.layer2.1.conv1``` and ```module.layer2.0.conv1``` are FQNs of two different modules named ```conv1``` in some module. Because a module's FQN reflects the module's hierarchy, adding/removing a DataParallel node also changes its FQN. Distiller uses FQNs to refer to modules and parameters (e.g. from YAML files), and non-functional changes to the model hierarchy, such as using DataParallel modules are handled by converting FQNs using ` ``utils.{de,}normalize_module_name()```. Before this commit, the SummaryGraph API assumed that the API client will convert layers names using ```utils.normalize_module_name()``` before invoking the API. This led to needlessly verbose client code, which was also error-prone and harder to read and maintain. This commit fixes these short-comings by relaxing the API, and handling the FQNN naming differences internally. The thinning implementation is simplified somewhat by refactoring to the new APIs lenient requirements. Added named_params_layers method to SummaryGraph that yields a 3-tuple of: layer name, param name, and param. When using the new method, summary graph communicates the true layer name in respect to the model it was initiated with.
-
- Apr 08, 2019
-
-
Neta Zmora authored
Unnfortunately, we maintain 2 copies of documentation images (one for the documentation source; another for the generated documentation). We need to solve this as it makes the repository size unproportionally large.
-
Neta Zmora authored
Add finer control over the pruning logic, to accommodate more pruning use-cases. The full description of the new logic is available in the updated [documentation of the CompressionScheduler](https://nervanasystems.github.io/distiller/schedule.html#pruning-fine-control), which is also part of this PR. In this PR: * Added a new callback to the CompressionScheduler: compression_scheduler.before_parameter_optimization which is invoked after the gradients are are computed, but before the weights are updated by the optimizer. * We provide an option to mask the gradients, before the weights are updated by the optimizer. We register to the parameter backward hook in order to mask the gradients. This gives us finer control over the parameter updates. * Added several DropFilter schedules. DropFilter is a method to regularize networks, and it can also be used to "prepare" a network for permanent filter pruning. *Added documentation of pruning fine-control
-
tacker-oh authored
Fixes #198. Previously 0s were being mapped to 0, effectively yielding a third quantization level. This fix maps 0s to 1.
-
Lev Zlotnik authored
-
Neta Zmora authored
Dropout layers were not handled properly in SummaryGraph, and caused the indexing of layer names to change. The root cause is that in ONNX uses the same node name for Dropout and Linear layers that are processed in sequence. ONNX nodes can be identified by three components: the ONNX node name, type, and instance. In SummaryGraph we ignore the node type when naming a node. Specifically in AlexNet, nodes the Dropout layers before a Linear layer have the same node name and instance, and are only distinguished by their type. SummaryGraph, ignorant of the type, skipped the Dropout layers and gave SG nodes the wrong name. Thus 'classifier.0', which is a Dropout node, became a Linear node. The fix is not to ignore duplicate (node name, instance) pairs by incrementing the instance.
-