- Feb 13, 2019
-
-
Neta Zmora authored
Merging the 'amc' branch with 'master'. This updates the automated compression code in 'master', and adds a greedy filter-pruning algorithm.
-
- Feb 12, 2019
-
-
Neta Zmora authored
This commit fixes (and adds a test) for the case that we with to load a thinned GPU checkpoint onto the CPU.
-
Guy Jacob authored
-
Neta Zmora authored
The root-cause of issue #148 is that DataParallel modules cannot execute on the CPU, on machines that have both CPUs and GPUs. Therefore, we don’t use DataParallel for models loaded for the CPUs, but we do wrap the models with DataParallel when loaded on the GPUs (to make them run faster). The names of module keys saved in a checkpoint file depend if the modules are wrapped by a DataParallel module or not. So loading a checkpoint that ran on the GPU onto a CPU-model (and vice-versa) will fail on the keys. This is all PyTorch and despite the community asking for a fix - e.g. https://github.com/pytorch/pytorch/issues/7457 - it is still pending. This commit contains code to catch key errors when loading a GPU-generated model (i.e. with DataParallel) onto a CPU, and convert the names of the keys. This PR also merges refactoring to load_chackpoint.py done by @barrh, who also added a test to further test loading checkpoints.
-
- Feb 11, 2019
-
-
Neta Zmora authored
Added recent Distiller citations
-
Guy Jacob authored
-
Guy Jacob authored
Summary of changes: (1) Post-train quantization based on pre-collected statistics (2) Quantized concat, element-wise addition / multiplication and embeddings (3) Move post-train quantization command line args out of sample code (4) Configure post-train quantization from YAML for more fine-grained control (See PR #136 for more detailed changes descriptions)
-
- Feb 10, 2019
-
-
Guy Jacob authored
* For CIFAR-10 / ImageNet only * Refactor data_loaders.py, reduce code duplication * Implemented custom sampler * Integrated in image classification sample * Since we now shuffle the test set, had to update expected results in 2 full_flow_tests that do evaluation
-
- Feb 06, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
A parameter was missing from one of the function calls.
-
Neta Zmora authored
-
Neta Zmora authored
Expand the command line arguments to recreate the original command line invocation.
-
Neta Zmora authored
The use of DataParallel is causing various small problems when used in conjunction with SummaryGraph. The best solution is to force SummaryGraph to use a non-data-parallel version of the model and to always normalize node names when accessing SummaryGraph operations.
-
- Jan 31, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
Specifically, gracefully handle a missing 'epoch' key in a loaded checkpoint file.
-
- Jan 27, 2019
-
-
JoyFreemanYan authored
-
- Jan 24, 2019
- Jan 23, 2019
- Jan 22, 2019
-
-
inner authored
-
- Jan 21, 2019
- Jan 16, 2019
-
-
Bar authored
* Support for multi-phase activations logging Enable logging activation both durning training and validation at the same session. * Refactoring: Move parser to its own file * Parser is moved from compress_classifier into its own file. * Torch version check is moved to precede main() call. * Move main definition to the top of the file. * Modify parser choices to case-insensitive
-
Neta Zmora authored
-
- Jan 15, 2019
-
-
Neta Zmora authored
-
Neta Zmora authored
Fix a mismatch between the location of the model and the computation.
-
- Jan 13, 2019
-
-
Neta Zmora authored
When masks are loaded from a checkpoint file, they should use the same device as the model.
-
Neta Zmora authored
-
- Jan 10, 2019
-
-
Gal Novik authored
In compress_classifier.py we added a new application argument: --cpu which you can use to force compute (training/inference) to run on the CPU when you invoke compress_classifier.py on a machine which has Nvidia GPUs. If your machine lacks Nvidia GPUs, then the compute will now run on the CPU (and you do not need the new flag). Caveat: we did not fully test the CPU support for the code in the Jupyter notebooks. If you find a bug, we apologize and appreciate your feedback.
-
- Jan 09, 2019
-
-
Neta Zmora authored
These show a couple of the networks on the top1/sparsity curvewq
-
- Jan 08, 2019
-
-
Bar authored
Block pruning: support specifying the block shape from the YAML file Block pruning refers to pruning 4-D structures of a specific shape. This is a why it is sometimes called structure-pruning or group-pruning (confusing, I know). A specific example of block pruning is filter or channel pruning, which have a highly-regular block shape. This commit adds support for pruning blocks/groups/structures that have irregular shapes that accelerate inference on a specific hardware platform. You can read more about the regularity of shapes in (Exploring the Regularity of Sparse Structure in Convolutional Neural Networks)[https://arxiv.org/pdf/1705.08922.pdf]. When we want to introduce sparsity in order to reduce the compute load of a certain layer, we need to understand how the HW and SW perform the layer's operation, and how this operation is vectorized. Then we can induce sparsity to match the vector shape. For example, Intel AVX-512 are SIMD instructions that apply the same instruction (Single Instruction) on a vector of inputs (Multiple Data). The following single instruction performs an element-wise multiplication of two 16 32-bit element vectors: __m256i result = __mm256_mul_epi32(vec_a, vec_b); If either vec_a or vec_b are partially sparse, we still need to perform the multiplication operation and the sparsity does not help reduce the cost (power, latency) of computation. However, if either vec_a or vec_b contain only zeros then we can eliminate entirely the instruction. In this case, we say that we would like to have group sparsity of 16-elements. I.e. the HW/SW benefits from sparsity induced in blocks of 16 elements. Things are a bit more involved because we also need to understand how the software maps layer operations to hardware. For example, a 3x3 convolution can be computed as a direct-convolution, as a matrix multiply operation, or as a Winograd matrix operation (to name a few ways of computation). These low-level operations are then mapped to SIMD instructions. Finally, the low-level SW needs to support a block-sparse storage-format for weight tensors (see for example: http://www.netlib.org/linalg/html_templates/node90.html)
-
- Dec 26, 2018
-
-
Gal Novik authored
Minor command line fix in the post training example
-
- Dec 23, 2018
-
-
Neta Zmora authored
-
- Dec 19, 2018
-
-
Neta Zmora authored
If compression_scheduler==None, then we need to set the value of losses[OVERALL_LOSS_KEY] (so it is the same as losses[OBJECTIVE_LOSS_KEY]). This was overlooked.
-
Neta Zmora authored
-
Neta Zmora authored
-