- May 15, 2019
-
-
Guy Jacob authored
Added a collector for activation histograms (sub-class of ActivationStatsCollector). It is stats-based, meaning it requires pre-computed min/max stats per tensor. This is done in order to prevent the need to save all of the activation tensors throughout the run. The stats are expected in the format generated by QuantCalibrationStatsCollector. Details: * Implemented ActivationHistogramsCollector * Added Jupyter notebook showcasing activation histograms * Implemented helper function that performs the stats collection pass and histograms pass in one go * Also added separate helper function just for quantization stats collection * Integrated in image classification sample * data_loaders.py: Added option to have a fixed subset throughout within the same session. Using it to keep the same subset between the stats collection and histograms collection phases. * Other changes: * Calling assign_layer_fq_names in base-class of collectors. We do this since the collectors, as implemented so far, assume this is done. So makes sense to just do it in the base class instead of expecting the user to do it. * Enforcing a non-parallel model for quantization stats and histograms collectors * Jupyter notebooks - add utility function to enable loggers in notebooks. This allows us to see any logging done by Distiller APIs called from notebooks.
-
Guy Jacob authored
* Set _force_outplace when calling get_trace_graph. This is a workaround for losing scope information for certain in-place operations * Switch all dicts to OrderedDicts
-
- May 14, 2019
- May 06, 2019
-
-
Bar authored
In a former commit, distiller accepts checkpoints that do not contain 'optimizer' argument. However, this change was not reflected in the relevant test.
-
- May 05, 2019
-
-
Neta Zmora authored
Support loading a model from a checkpoint file that does not have an Optimizer instance. Before the change, loading such a model required using ```load_lean_checkpoint``` (or --exp-load-weights-from From the compress_classifier.py command-line), therefore this change is for convenience only.
-
- May 02, 2019
-
-
Lev Zlotnik authored
-
- May 01, 2019
-
-
Neta Zmora authored
Added a link to the FAQ wiki page.
-
Lev Zlotnik authored
-
- Apr 30, 2019
-
-
Lev Zlotnik authored
* Update test_lstm_impl.py * Added PackedSequence functionality * Refactored forward implementation
-
- Apr 18, 2019
-
-
Bar authored
Also: * Single worker limitation not needed anymore, been fixed in PyTorch since v0.4.0 (https://github.com/pytorch/pytorch/pull/4640) * compress_classifier.py: If run in evaluation mode (--eval), enable deterministic mode. * Call utils.set_deterministic at data loaders creation if deterministic argument is set (don't assume user calls it outside) * Disable CUDNN benchmark mode in utils.set_deterministic (https://pytorch.org/docs/stable/notes/randomness.html#cudnn)
-
- Apr 16, 2019
-
-
Lev Zlotnik authored
* Introduce a modular, Python-level implementation of LSTM/LSTMCell using existing PyTorch nn.Modules as building blocks * This allows quantization of weights and internal activations of LSTM layers using the existing Quantizer. (In the PyTorch implementation of RNN/LSTM only the weights are exposed at the Python level, whereas the internal activations are "hidden" in C++ code.) * Supports stacked (multi-layer) and bi-directional LSTM * Implemented conversion functions from PyTorch LSTM module to our LSTM module and vice-versa * Tests for modular implementation correctness and for conversions * Jupyter notebook showing post-training quantization of a language model
-
Lev Zlotnik authored
-
- Apr 14, 2019
-
-
Guy Jacob authored
* Some refactoring to enable multiple clipping methods * BREAKING: clip_acts as a boolean flag (either in command line or in function signature) will fail. Error message with valid values from is displayed. * Implemented clipping activations at mean + N * std (N is user configurable) * Additional tests * Updated docs
-
Guy Jacob authored
-
- Apr 11, 2019
-
-
Guy Jacob authored
* Replace the optional 'best_top1' parameter with a generic optional dict which the caller can populate as needed. * Saved in the checkpoint under the key 'extras'
-
Guy Jacob authored
* In all logger types (PythonLogger, TensorBoardLogger, CSVLogger) * Exact behavior varies per logger type and documented in the code. * To enable in CSVLogger, changed its API to take a file name prefix (optionally empty) instead of the full name, and use a hard-coded name for logging weights sparsity. * Also fixed signature of log_training_progress in base DataLogger class to match the signature used in the sub-classes.
-
- Apr 09, 2019
-
-
Neta Zmora authored
Also added tests
-
Bar authored
This commit simplifies the SummaryGraph API, by removing from the client to burden to handle the differences between models with/without DataParallel layers. DataParallel layers in PyTorch change the fully-qualified names (FQNs) of PyTorch modules. A module's FQN unambiguously identifies a module within a model, by encoding the path to the module from the root of the model. For example, ```module.layer2.1.conv1``` and ```module.layer2.0.conv1``` are FQNs of two different modules named ```conv1``` in some module. Because a module's FQN reflects the module's hierarchy, adding/removing a DataParallel node also changes its FQN. Distiller uses FQNs to refer to modules and parameters (e.g. from YAML files), and non-functional changes to the model hierarchy, such as using DataParallel modules are handled by converting FQNs using ` ``utils.{de,}normalize_module_name()```. Before this commit, the SummaryGraph API assumed that the API client will convert layers names using ```utils.normalize_module_name()``` before invoking the API. This led to needlessly verbose client code, which was also error-prone and harder to read and maintain. This commit fixes these short-comings by relaxing the API, and handling the FQNN naming differences internally. The thinning implementation is simplified somewhat by refactoring to the new APIs lenient requirements. Added named_params_layers method to SummaryGraph that yields a 3-tuple of: layer name, param name, and param. When using the new method, summary graph communicates the true layer name in respect to the model it was initiated with.
-
- Apr 08, 2019
-
-
Neta Zmora authored
Unnfortunately, we maintain 2 copies of documentation images (one for the documentation source; another for the generated documentation). We need to solve this as it makes the repository size unproportionally large.
-
Neta Zmora authored
Add finer control over the pruning logic, to accommodate more pruning use-cases. The full description of the new logic is available in the updated [documentation of the CompressionScheduler](https://nervanasystems.github.io/distiller/schedule.html#pruning-fine-control), which is also part of this PR. In this PR: * Added a new callback to the CompressionScheduler: compression_scheduler.before_parameter_optimization which is invoked after the gradients are are computed, but before the weights are updated by the optimizer. * We provide an option to mask the gradients, before the weights are updated by the optimizer. We register to the parameter backward hook in order to mask the gradients. This gives us finer control over the parameter updates. * Added several DropFilter schedules. DropFilter is a method to regularize networks, and it can also be used to "prepare" a network for permanent filter pruning. *Added documentation of pruning fine-control
-
tacker-oh authored
Fixes #198. Previously 0s were being mapped to 0, effectively yielding a third quantization level. This fix maps 0s to 1.
-
Lev Zlotnik authored
-
Neta Zmora authored
Dropout layers were not handled properly in SummaryGraph, and caused the indexing of layer names to change. The root cause is that in ONNX uses the same node name for Dropout and Linear layers that are processed in sequence. ONNX nodes can be identified by three components: the ONNX node name, type, and instance. In SummaryGraph we ignore the node type when naming a node. Specifically in AlexNet, nodes the Dropout layers before a Linear layer have the same node name and instance, and are only distinguished by their type. SummaryGraph, ignorant of the type, skipped the Dropout layers and gave SG nodes the wrong name. Thus 'classifier.0', which is a Dropout node, became a Linear node. The fix is not to ignore duplicate (node name, instance) pairs by incrementing the instance.
-
- Apr 04, 2019
-
-
Lev Zlotnik authored
-
- Apr 03, 2019
- Apr 01, 2019
-
-
Lev Zlotnik authored
* Bias handling: * Add 'bits_bias' parameter to explicitly specify # of bits for bias, similar to weights and activations. * BREAKING: Remove the now redundant 'quantize_bias' boolean parameter * Custom overrides: * Expand the semantics of the overrides dict to allow overriding of other parameters in addition to bit-widths * Functions registered in the quantizer's 'replacement_factory' can define keyword arguments. Non bit-width entries in the overrides dict will be checked against the function signature and passed * BREAKING: * Changed the name of 'bits_overrides' to simply 'overrides' * Bit-width overrides must now be defined using the full parameter names - 'bits_activations/weights/bias' instead of the short-hands 'acts' and 'wts' which were used so far. * Added/updated relevant tests * Modified all quantization YAMLs under 'examples' to reflect these changes * Updated docs
-
Neta Zmora authored
Fix copy-paste mistake
-
Neta Zmora authored
The code that installs distiller tries to import distiller.
-
Neta Zmora authored
-
Bar authored
-
Bar authored
Load optimizer from checkpoint (BREAKING - see details) (#182) * Fixes issues #70, #145 and replaces PR #74 * checkpoint.py * save_checkpoint will now save the optimizer type in addition to its state * load_checkpoint will now instantiate an optimizer based on the saved type and load its state * config.py: file/dict_config now accept the resumed epoch to pass to LR schedulers * policy.py: LRPolicy now passes the current epoch to the LR scheduler * Classifier compression sample * New flag '--resume-from' for properly resuming a saved training session, inc. optimizer state and epoch # * Flag '--reset-optimizer' added to allow discarding of a loaded optimizer. * BREAKING: * Previous flag '--resume' is deprecated and is mapped to '--resume-from' + '--reset-optimizer'. * But, old resuming behavior had an inconsistency where the epoch count would continue from the saved epoch, but the LR scheduler was setup as if we were starting from epoch 0. * Using '--resume-from' + '--reset-optimizer' now will simply RESET the epoch count to 0 for the whole environment. * This means that scheduling configurations (in YAML or code) which assumed use of '--resume' might need to be changed to reflect the fact that the epoch count now starts from 0 * All relevant YAML files under 'examples' modified to reflect this change * Initial support for ReduceLROnPlateau (#161): * Allow passing **kwargs to policies via the scheduler * Image classification now passes the validation loss to the scheduler, to be used yo ReduceLROnPlateau * The current implementation is experimental and subject to change
-
Neta Zmora authored
-
Neta Zmora authored
This fix does not change the behavior. The previous code worked correctly because 'weights' and '.weight' have the same length.
-
- Mar 31, 2019
-
-
Guy Jacob authored
-
- Mar 29, 2019
-
-
Songyi Blair Han authored
-
- Mar 28, 2019
-
-
Lev Zlotnik authored
* Added distiller.utils.convert_recursively_to , replaced _treetuple2device in SummaryGraph with it. * Renamed to convert_tensors_recursively_to
-