- Apr 12, 2020
-
-
Neta Zmora authored
Added citations
-
Neta Zmora authored
change distiller citation as suggested in issue #492
-
Neta Zmora authored
Remove warning regarding Distiller release 0.3 (breaking backward compat)
-
- Mar 31, 2020
-
-
Guy Jacob authored
-
- Feb 26, 2020
-
-
Guy Jacob authored
The gitdb versioning issue is resolved internally in gitpython 3.1.0, so moving to that and removing specific gitb requirements
-
- Feb 23, 2020
-
-
levzlotnik authored
-
levzlotnik authored
-
levzlotnik authored
-
- Feb 17, 2020
-
-
Guy Jacob authored
-
Guy Jacob authored
* BUGFIX: Fixed wrong attribute name for zero-point in conversion of eltwise add/mult and concat * Add PyTorch PTQ convert for embedding (converted to FP32 embedding + quant op) * Fix conversion function to work with tuple/list model inputs
-
Guy Jacob authored
* Move image classification specific setup code to separate script at examples/classifier_compression/ptq_lapq.py * Make ptq_coordinate_search function completely independent of command line arguments * Change LAPQ command line args function to update existing pre-existing parser (changed CLAs perfix to 'lapq' for more clarity) * Enable LAPQ from compress_classifier.py (trigger with --qe-lapq) * Add pointers in documentation
-
- Feb 13, 2020
-
-
Guy Jacob authored
-
- Feb 09, 2020
-
-
levzlotnik authored
-
- Feb 06, 2020
-
-
Guy Jacob authored
Convert Distiller PTQ models to "native" PyTorch PTQ (#458) * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch() * Can also be called from PostTrainLinearQuantizer instance: quantizer.convert_to_pytorch() * Can also trigger from command line in image classification sample * Can save/load converted modules via apputils.load/save_checkpoint * Added Jupyter notebook tutorial * Converted modules have only the absolutely necessary quant-dequant operations. For a fully quantized model, this means just quantization of model input and de-quantization of model output. If a user keeps specific internal layers in FP32, quant-dequant operations are added as needed * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we take care of preventing overflows (aka "reduce_range" in the PyTorch API)
-
- Feb 03, 2020
-
-
Guy Jacob authored
-
- Feb 02, 2020
-
-
Lev Zlotnik authored
"Loss Aware Post-Training Quantization" (Nahshan et al., 2019) Paper: https://arxiv.org/abs/1911.07190 Reference implementation: https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq Proper documentation is still TODO, for now see the example YAML file at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml' * Implemented in distiller/quantization/ptq_coordinate_search.py * At the moment that file both the model-independent algorithm implementation and image-classification specific sample script. Still TODO: Refactor that * Post train quantization changes (range_linear): * Added getters/setters for quantization parameters (scale/zero_point) and clipping values * Add option to save backup of FP32 weights to allow re-quantization after quantizer was created. * Add option to clip weights in addition to activations * Fix fusions to not occur only when activations aren't quantized * RangeLinearFakeQuantWrapper: * Make inputs quantization optional * In case of ReLU + ACIQ, clip according to input stats * Data loaders: * Add option to not load train set at all from disk (to speed up loading time in post-training runs) * Modified "image_classifier.py" accordingly
-
- Jan 19, 2020
-
-
Neta Zmora authored
Temp patch until moving to torchvision 0.5. See https://github.com/pytorch/vision/issues/1712#issuecomment-575036523
-
- Jan 18, 2020
-
-
Neta Zmora authored
Fix the formatting of a ValueError raised when a module is missing an attribute, when collecting activation statistics.
-
- Jan 15, 2020
-
-
Guy Jacob authored
(we use 8-bit values below, but this applies to any bit-width) * We use the notion of "full" and "restricted" quantized range for symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342) * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127] * Until now, when doing symmetric quantization we assumed a "full" range when saturating after quantization, but calculated the scale factor as if the range was restricted. This means we weren't making full utilization of the quantized range. * On the other hand, in some other implementations of quantization (e.g. TensorFlow), the "restricted" range is used. * So, we make it an option to use either the proper "full" range (q_min = -128) or "restricted" range (q_min = -127). * LinearQuantMode.SYMMETRIC now means the "full" range is used, and added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted" range. * Updated tests and documentation.
-
- Jan 06, 2020
-
-
Guy Jacob authored
* Fake quant wrapper now also works on (fake) quantized inputs * Remove 'requires_quantized_inputs' flag * Unrelated: Moved LinearQuantMode enum to q_utils
-
- Dec 30, 2019
-
-
Guy Jacob authored
In PostTrainLinearQuantizer and QuantAwareTrainRangeLinearQuantizer
-
- Dec 29, 2019
-
-
Neta Zmora authored
-
- Dec 26, 2019
-
-
Guy Jacob authored
-
- Dec 18, 2019
-
-
Bar authored
Add directionality to SummaryActivationStatsCollector to allow collection of statistics on incoming and outgoing activations/feature-maps; instead of just outgoing activations. Also includes some code refactoring.
-
- Dec 12, 2019
-
-
Guy Jacob authored
-
- Dec 11, 2019
- Dec 09, 2019
-
-
Guy Jacob authored
-
Guy Jacob authored
-
Guy Jacob authored
* Make it easier to find sample apps for different workload types * Add READMEs for sample apps the didn't have any * Update readmes with experiment results where applicable
-
Lev Zlotnik authored
Added tables of results for 85% sparsity
-
- Dec 08, 2019
-
-
Guy Jacob authored
* Weights-only PTQ: * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in which case it'll act as a simple pass-through during forward * In RangeLinearQuantParamLayerWrapper, if bits_activations is None and num_bits_params > 0, Perform quant and de-quant of the parameters instead of just quant. * Activations-only PTQ: * Enable activations only quantization for conv/linear modules. When PostTrainLinearQuantizer detects # bits != None for activations and # bits == None for weights, a fake-quantization wrapper will be used. * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command line arguments to invoke weights/activations-only quantization, respectively. * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_* functions
-
-
- Dec 03, 2019
-
-
SunYiran authored
-
- Dec 02, 2019
-
-
Neta Zmora authored
compute-summary and png-summary currently work with image classifiers only.
-
Neta Zmora authored
When multi-processing, we want only one process to generate the summary, while the other processes do nothing (lazy bums!)
-
levzlotnik authored
-
Lev Zlotnik authored
Add an example of compressing OD pytorch models. In this example we compress torchvision's object detection models - FasterRCNN / MaskRCNN / KeypointRCNN. We've modified the reference code for object detection to allow easy compression scheduling with YAML configuration.
-
levzlotnik authored
of a model by name relative to the root of the model.
-
- Nov 28, 2019
-