- Feb 23, 2020
-
-
levzlotnik authored
-
- Feb 09, 2020
-
-
levzlotnik authored
-
- Feb 06, 2020
-
-
Guy Jacob authored
Convert Distiller PTQ models to "native" PyTorch PTQ (#458) * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch() * Can also be called from PostTrainLinearQuantizer instance: quantizer.convert_to_pytorch() * Can also trigger from command line in image classification sample * Can save/load converted modules via apputils.load/save_checkpoint * Added Jupyter notebook tutorial * Converted modules have only the absolutely necessary quant-dequant operations. For a fully quantized model, this means just quantization of model input and de-quantization of model output. If a user keeps specific internal layers in FP32, quant-dequant operations are added as needed * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we take care of preventing overflows (aka "reduce_range" in the PyTorch API)
-
- Feb 03, 2020
-
-
Guy Jacob authored
-
- Feb 02, 2020
-
-
Lev Zlotnik authored
"Loss Aware Post-Training Quantization" (Nahshan et al., 2019) Paper: https://arxiv.org/abs/1911.07190 Reference implementation: https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq Proper documentation is still TODO, for now see the example YAML file at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml' * Implemented in distiller/quantization/ptq_coordinate_search.py * At the moment that file both the model-independent algorithm implementation and image-classification specific sample script. Still TODO: Refactor that * Post train quantization changes (range_linear): * Added getters/setters for quantization parameters (scale/zero_point) and clipping values * Add option to save backup of FP32 weights to allow re-quantization after quantizer was created. * Add option to clip weights in addition to activations * Fix fusions to not occur only when activations aren't quantized * RangeLinearFakeQuantWrapper: * Make inputs quantization optional * In case of ReLU + ACIQ, clip according to input stats * Data loaders: * Add option to not load train set at all from disk (to speed up loading time in post-training runs) * Modified "image_classifier.py" accordingly
-
- Jan 19, 2020
-
-
Neta Zmora authored
Temp patch until moving to torchvision 0.5. See https://github.com/pytorch/vision/issues/1712#issuecomment-575036523
-
- Jan 18, 2020
-
-
Neta Zmora authored
Fix the formatting of a ValueError raised when a module is missing an attribute, when collecting activation statistics.
-
- Jan 15, 2020
-
-
Guy Jacob authored
(we use 8-bit values below, but this applies to any bit-width) * We use the notion of "full" and "restricted" quantized range for symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342) * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127] * Until now, when doing symmetric quantization we assumed a "full" range when saturating after quantization, but calculated the scale factor as if the range was restricted. This means we weren't making full utilization of the quantized range. * On the other hand, in some other implementations of quantization (e.g. TensorFlow), the "restricted" range is used. * So, we make it an option to use either the proper "full" range (q_min = -128) or "restricted" range (q_min = -127). * LinearQuantMode.SYMMETRIC now means the "full" range is used, and added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted" range. * Updated tests and documentation.
-
- Jan 06, 2020
-
-
Guy Jacob authored
* Fake quant wrapper now also works on (fake) quantized inputs * Remove 'requires_quantized_inputs' flag * Unrelated: Moved LinearQuantMode enum to q_utils
-
- Dec 30, 2019
-
-
Guy Jacob authored
In PostTrainLinearQuantizer and QuantAwareTrainRangeLinearQuantizer
-
- Dec 29, 2019
-
-
Neta Zmora authored
-
- Dec 26, 2019
-
-
Guy Jacob authored
-
- Dec 18, 2019
-
-
Bar authored
Add directionality to SummaryActivationStatsCollector to allow collection of statistics on incoming and outgoing activations/feature-maps; instead of just outgoing activations. Also includes some code refactoring.
-
- Dec 12, 2019
-
-
Guy Jacob authored
-
- Dec 11, 2019
- Dec 09, 2019
-
-
Guy Jacob authored
-
Guy Jacob authored
-
Guy Jacob authored
* Make it easier to find sample apps for different workload types * Add READMEs for sample apps the didn't have any * Update readmes with experiment results where applicable
-
Lev Zlotnik authored
Added tables of results for 85% sparsity
-
- Dec 08, 2019
-
-
Guy Jacob authored
* Weights-only PTQ: * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in which case it'll act as a simple pass-through during forward * In RangeLinearQuantParamLayerWrapper, if bits_activations is None and num_bits_params > 0, Perform quant and de-quant of the parameters instead of just quant. * Activations-only PTQ: * Enable activations only quantization for conv/linear modules. When PostTrainLinearQuantizer detects # bits != None for activations and # bits == None for weights, a fake-quantization wrapper will be used. * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command line arguments to invoke weights/activations-only quantization, respectively. * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_* functions
-
- Dec 03, 2019
-
-
SunYiran authored
-
- Dec 02, 2019
-
-
Neta Zmora authored
compute-summary and png-summary currently work with image classifiers only.
-
Neta Zmora authored
When multi-processing, we want only one process to generate the summary, while the other processes do nothing (lazy bums!)
-
levzlotnik authored
-
Lev Zlotnik authored
Add an example of compressing OD pytorch models. In this example we compress torchvision's object detection models - FasterRCNN / MaskRCNN / KeypointRCNN. We've modified the reference code for object detection to allow easy compression scheduling with YAML configuration.
-
levzlotnik authored
of a model by name relative to the root of the model.
-
- Nov 28, 2019
-
-
Neta Zmora authored
- define ALMOST_ONE - define op_type - remove sanity assert (need to understand what tolerance value to use in the assert) Co-authored-by: csc12138 Co-authored-by: wangyidong3
-
Neta Zmora authored
- define ALMOST_ONE - define op_type - remove sanity assert (need to understand what tolerance value to use in the assert)
- Nov 27, 2019
-
-
Neta Zmora authored
This will help define and use different performance sorting schemes. E.g. this will address the issue raised in issue #411
-
Neta Zmora authored
Small variances can occur when using different cudnn versions, even when the environment and distiller version is the same.
-
Neta Zmora authored
Said commit was wrong: the default inializations in pytorch are not the same as in our code. For example, the default convolution weight initialization uses Kaiming-uniform, while we used Kaiming-normal. For backward comparability of the model behavior, we need to revert to the old behavior. This reverts commit 6913687f.
-
- Nov 25, 2019
-
-
Neta Zmora authored
Update the definition of the exits using info from Haim. This is still very unsatsifactory because we don't have working examples to show users :-(
-
- Nov 17, 2019
-
-
Neta Zmora authored
-
- Nov 16, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
except for the case of VGG, our parameter initialization code was matched the default pytorch initialization (per torch.nn operation), so writing the initialization code ourselves can only lead to more code and maintenance; and also we would not benefit from improvements that occur at the pytorch level (e.g. if FB finds a better initialization for nn.conv2d than today's kaiming init, we would not benefit). The VGG initialization we had was "suspicious" and so reverting to the default seems reasonable.
-
- Nov 14, 2019
-
-
Guy Jacob authored
-