- Oct 27, 2020
-
-
Guy Jacob authored
-
- May 11, 2020
-
-
Guy Jacob authored
-
- Apr 30, 2020
- Apr 20, 2020
-
-
Neta Zmora authored
This script shows how to specify a compression-schedule directly using Distiller's API, instead of using a YAML specification examples/scheduling_api/direct_api_pruning.py
-
- Feb 17, 2020
-
-
Guy Jacob authored
-
Guy Jacob authored
* Move image classification specific setup code to separate script at examples/classifier_compression/ptq_lapq.py * Make ptq_coordinate_search function completely independent of command line arguments * Change LAPQ command line args function to update existing pre-existing parser (changed CLAs perfix to 'lapq' for more clarity) * Enable LAPQ from compress_classifier.py (trigger with --qe-lapq) * Add pointers in documentation
-
- Feb 06, 2020
-
-
Guy Jacob authored
Convert Distiller PTQ models to "native" PyTorch PTQ (#458) * New API: distiller.quantization.convert_distiller_ptq_model_to_pytorch() * Can also be called from PostTrainLinearQuantizer instance: quantizer.convert_to_pytorch() * Can also trigger from command line in image classification sample * Can save/load converted modules via apputils.load/save_checkpoint * Added Jupyter notebook tutorial * Converted modules have only the absolutely necessary quant-dequant operations. For a fully quantized model, this means just quantization of model input and de-quantization of model output. If a user keeps specific internal layers in FP32, quant-dequant operations are added as needed * Can configure either 'fbgemm' or 'qnnpack' backend. For 'fbgemm' we take care of preventing overflows (aka "reduce_range" in the PyTorch API)
-
- Feb 03, 2020
-
-
Guy Jacob authored
-
- Feb 02, 2020
-
-
Lev Zlotnik authored
"Loss Aware Post-Training Quantization" (Nahshan et al., 2019) Paper: https://arxiv.org/abs/1911.07190 Reference implementation: https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq Proper documentation is still TODO, for now see the example YAML file at 'examples/quantization/post_train_quant/resnet18_imagenet_post_train_lapq.yaml' * Implemented in distiller/quantization/ptq_coordinate_search.py * At the moment that file both the model-independent algorithm implementation and image-classification specific sample script. Still TODO: Refactor that * Post train quantization changes (range_linear): * Added getters/setters for quantization parameters (scale/zero_point) and clipping values * Add option to save backup of FP32 weights to allow re-quantization after quantizer was created. * Add option to clip weights in addition to activations * Fix fusions to not occur only when activations aren't quantized * RangeLinearFakeQuantWrapper: * Make inputs quantization optional * In case of ReLU + ACIQ, clip according to input stats * Data loaders: * Add option to not load train set at all from disk (to speed up loading time in post-training runs) * Modified "image_classifier.py" accordingly
-
- Jan 15, 2020
-
-
Guy Jacob authored
(we use 8-bit values below, but this applies to any bit-width) * We use the notion of "full" and "restricted" quantized range for symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342) * "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127] * Until now, when doing symmetric quantization we assumed a "full" range when saturating after quantization, but calculated the scale factor as if the range was restricted. This means we weren't making full utilization of the quantized range. * On the other hand, in some other implementations of quantization (e.g. TensorFlow), the "restricted" range is used. * So, we make it an option to use either the proper "full" range (q_min = -128) or "restricted" range (q_min = -127). * LinearQuantMode.SYMMETRIC now means the "full" range is used, and added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted" range. * Updated tests and documentation.
-
- Dec 30, 2019
-
-
Guy Jacob authored
In PostTrainLinearQuantizer and QuantAwareTrainRangeLinearQuantizer
-
- Dec 29, 2019
-
-
Neta Zmora authored
-
- Dec 26, 2019
-
-
Guy Jacob authored
-
- Dec 09, 2019
-
-
Guy Jacob authored
-
Guy Jacob authored
* Make it easier to find sample apps for different workload types * Add READMEs for sample apps the didn't have any * Update readmes with experiment results where applicable
-
Lev Zlotnik authored
Added tables of results for 85% sparsity
-
- Dec 08, 2019
-
-
Guy Jacob authored
* Weights-only PTQ: * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in which case it'll act as a simple pass-through during forward * In RangeLinearQuantParamLayerWrapper, if bits_activations is None and num_bits_params > 0, Perform quant and de-quant of the parameters instead of just quant. * Activations-only PTQ: * Enable activations only quantization for conv/linear modules. When PostTrainLinearQuantizer detects # bits != None for activations and # bits == None for weights, a fake-quantization wrapper will be used. * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command line arguments to invoke weights/activations-only quantization, respectively. * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_* functions
-
- Dec 03, 2019
-
-
SunYiran authored
-
- Dec 02, 2019
-
-
Neta Zmora authored
compute-summary and png-summary currently work with image classifiers only.
-
Neta Zmora authored
When multi-processing, we want only one process to generate the summary, while the other processes do nothing (lazy bums!)
-
Lev Zlotnik authored
Add an example of compressing OD pytorch models. In this example we compress torchvision's object detection models - FasterRCNN / MaskRCNN / KeypointRCNN. We've modified the reference code for object detection to allow easy compression scheduling with YAML configuration.
-
- Nov 28, 2019
-
-
Neta Zmora authored
- define ALMOST_ONE - define op_type - remove sanity assert (need to understand what tolerance value to use in the assert) Co-authored-by: csc12138 Co-authored-by: wangyidong3
-
Neta Zmora authored
- define ALMOST_ONE - define op_type - remove sanity assert (need to understand what tolerance value to use in the assert)
-
- Nov 17, 2019
-
-
Neta Zmora authored
-
- Nov 16, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
-
- Nov 13, 2019
-
-
Bar authored
* Previous implementation: * Stats collection required a separate run with `-qe-calibration`. * Specifying `--quantize-eval` without `--qe-stats-file` triggered dynamic quantization. * Running with `--quantize-eval --qe-calibration <num>` only ran stats collection and ignored --quantize-eval. * New implementation: * Running `--quantize-eval --qe-calibration <num>` will now perform stats collection according to the calibration flag, and then quantize the model with the collected stats (and run evaluation). * Specifying `--quantize-eval` without `--qe-stats-file` will trigger the same flow as in the bullet above, as if `--qe-calibration 0.05` was used (i.e. 5% of the test set will be used for stats). * Added new flag: `--qe-dynamic`. From now, to do dynamic quantization, need to explicitly run: `--quantize-eval --qe-dynamic` * As before, can still run `--qe-calibration` without `--quantize-eval` to perform "stand-alone" stats collection * The following flags, which all represent different ways to control creation of stats or use of existing stats, are now mutually exclusive: `--qe-calibration`, `-qe-stats-file`, `--qe-dynamic`, `--qe-config-file`
-
- Nov 11, 2019
-
-
Neta Zmora authored
* pruning: add an option to virtually fold BN into Conv2D for ranking PruningPolicy can be configured using a new control argument fold_batchnorm: when set to `True`, the weights of BatchNorm modules are folded into the weights of Conv-2D modules (if Conv2D->BN edges exist in the model graph). Each weights filter is attenuated using a different pair of (gamma, beta) coefficients, so `fold_batchnorm` is relevant for fine-grained and filter-ranking pruning methods. We attenuate using the running values of the mean and variance, as is done in quantization. This control argument is only supported for Conv-2D modules (i.e. other convolution operation variants and Linear operations are not supported). e.g.: policies: - pruner: instance_name : low_pruner args: fold_batchnorm: True starting_epoch: 0 ending_epoch: 30 frequency: 2 * AGP: non-functional refactoring distiller/pruning/automated_gradual_pruner.py – change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`, which is a more appropriate function name as we don’t really prune in this function * Simplify GEMM weights input-channel ranking logic Ranking weight-matrices by input channels is similar to ranking 4D Conv weights by input channels, so there is no need for duplicate logic. distiller/pruning/ranked_structures_pruner.py -change `prune_to_target_sparsity` to `_set_param_mask_by_sparsity_target`, which is a more appropriate function name as we don’t really prune in this function -remove the code handling ranking of matrix rows distiller/norms.py – remove rank_cols. distiller/thresholding.py – in expand_binary_map treat `channels` group_type the same as the `cols` group_type when dealing with 2D weights * AGP: add example of ranking filters with virtual BN-folding Also update resnet20 AGP examples
-
- Nov 07, 2019
-
-
Neta Zmora authored
Fix the EE code so that it works with the current 'master' branch, and add a test for high-level EE regression
-
- Nov 06, 2019
-
-
Guy Jacob authored
Co-authored-by:
Bar <29775567+barrh@users.noreply.github.com> Co-authored-by:
Guy Jacob <guy.jacob@intel.com>
-
- Oct 31, 2019
- Oct 23, 2019
-
-
Neta Zmora authored
Force loading on the CPU which always has more memory than a single GPU. This is useful for models that cannot be loaded onto a single GPU.
-
Neta Zmora authored
-
Neta Zmora authored
As documented in issue #395, some of the command-line examples in the AMC notebooks are incorrect. Also, fix some bugs that were introduced with the refactoring of the low-level pruning API
-
- Oct 07, 2019
-
-
Guy Jacob authored
* Greedy search script for post-training quantization settings * Iterates over each layer in the model in order. For each layer, checks a user-defined set of quantization settings and chooses the best one based on validation accuracy * Provided sample that searches for best activations-clipping mode per layer, on image classification models * Proper handling of mixed-quantization settings in post-train quant: * By default, the quantization settings for each layer apply only to output quantization * Propagate quantization settings for activations tensors through the model during execution * For non-quantized inputs to layers that require quantized inputs, fall-back to quantizing according to the settings used for the output * In addition, provide mechanism to override inputs quantization settings via the YAML configuration file * By default all modules are quantized now. For module types that don't have a dedicated quantized implementation, "fake" quantization is performed * Misc. Changes * Fuse ReLU/ReLU6 to predecessor during post-training quantization * Fixes to ACIQ clipping in the half-range case Co-authored-by:
Lev Zlotnik <lev.zlotnik@intel.com> Co-authored-by:
Guy Jacob <guy.jacob@intel.com>
-
Neta Zmora authored
-