- Nov 25, 2018
-
-
Neta Zmora authored
-
- Nov 24, 2018
-
-
Guy Jacob authored
-
Neta Zmora authored
Thanks to Dan Alistarh for bringing this issue to my attention. The activations of Linear layers have shape (batch_size, output_size) and those of Convolution layers have shape (batch_size, num_channels, width, height) and this distinction in shape was not correctly handled. This commit also fixes sparsity computation for very large activations, as seen in VGG16, which leads to memory exhaustion. One solution is to use smaller batch sizes, but this commit uses a different solution, which counts zeros “manually”, and using less space. Also in this commit: - Added a “caveats” section to the documentation. - Added more tests.
-
- Nov 22, 2018
-
-
Neta Zmora authored
The super() method of the wrong subclass was used. In this case there were no practical implications, but we need to move to the less error-prone syntax of Python 3.x which does not require us to specify the subclass. I change the super() invocations in the entire file and ran two schedules for ResNet56 and actually got better results than previously. I don't think these results are related to this change, and I cannot explain them. Nontheless, I am committing these new results, because I also fixed the command-line parameters of resnet56_cifar_filter_rank_v2.yaml which had a copy & paste error in it.
-
Neta Zmora authored
* Fix issue #79 Change the default values so that the following scheduler meta-data keys are always defined: 'starting_epoch', 'ending_epoch', 'frequency' * compress_classifier.py: add a new argument Allow the specification, from the command line arguments, of the range of pruning levels scanned when doing sensitivity analysis * Add regression test for issue #79
-
- Nov 21, 2018
-
-
Neta Zmora authored
In our patched ResNet version, we change TorchVision's code so that ReLU module instances are used only once in a network.
-
Neta Zmora authored
When detecting a module that is used multiple times, stop execution and print an explanation to the user.v
-
Neta Zmora authored
Trying to simplify the code.
-
Neta Zmora authored
-
Neta Zmora authored
Add docs/conditional_computation.md which was accidentally left out of an earlier commit.
-
- Nov 20, 2018
-
-
Neta Zmora authored
* Bug fix: value of best_top1 stored in the checkpoint may be wrong If you invoke compress_clasifier.py with --num-best-scores=n with n>1, then the value of best_top1 stored in checkpoints is wrong.
-
Neta Zmora authored
When we resume from a checkpoint, we usually want to continue using the checkpoint’s masks. I say “usually” because I can see a situation where we want to prune a model and checkpoint it, and then resume with the intention of fine-tuning w/o keeping the masks. This is what’s done in Song Han’s Dense-Sparse-Dense (DSD) training (https://arxiv.org/abs/1607.04381). But I didn’t want to add another argument to ```compress_classifier.py``` for the time being – so we ignore DSD. There are two possible situations when we resume a checkpoint that has a serialized ```CompressionScheduler``` with pruning masks: 1. We are planning on using a new ```CompressionScheduler``` that is defined in a schedule YAML file. In this case, we want to copy the masks from the serialized ```CompressionScheduler``` to the new ```CompressionScheduler``` that we are constructing from the YAML file. This is one fix. 2. We are resuming a checkpoint, but without using a YAML schedule file. In this case we want to use the ```CompressionScheduler``` that we loaded from the checkpoint file. All this ```CompressionScheduler``` does is keep applying the masks as we train, so that we don’t lose them. This is the second fix. For DSD, we would need a new flag that would override using the ```CompressionScheduler``` that we load from the checkpoint.
-
- Nov 09, 2018
-
-
Neta Zmora authored
Another schedule or ResNet20 Filter-wise pruning for ResNet20, with 64.6% sparsity, 25.4% compute reduction and Top1 91.47% (vs. 91.78 basline).
-
- Nov 08, 2018
-
-
Neta Zmora authored
Top1 is 75.492 (on Epoch: 93) vs the published TorchVision baseline Top1: 76.15 (-0.66). Total sparsity: 80.05
-
Neta Zmora authored
Change the LR from 0.2 to 0.3, as was actually used to generate the results in remark.
-
Haim Barad authored
* Updated stats computation - fixes issues with validation stats * Clarification of output (docs) * Update * Moved validation stats to separate function
-
Neta Zmora authored
-
Guy Jacob authored
-
- Nov 07, 2018
-
-
Neta Zmora authored
Add missing files from previous commit
-
Neta Zmora authored
-
- Nov 06, 2018
-
-
Neta Zmora authored
We recently changed the signature of the on_minibatch_begin() callback from the scheduler (added 'meta') and the callback client in the thinning module was not updated.
-
Neta Zmora authored
By default, when we create a model we wrap it with DataParallel to benefit from data-parallelism across GPUs (mainly for convolution layers). But sometimes we don't want the sample application to do this: for example when we receive a model that was trained serially. This commit adds a new argument to the application to prevent the use of DataParallel.
-
Haim Barad authored
* Fixed validation stats and added new summary stats * Trimmed some comments. * Improved figure for documentation * Minor updates
-
- Nov 05, 2018
-
-
Neta Zmora authored
Added an implementation of: Dynamic Network Surgery for Efficient DNNs, Yiwen Guo, Anbang Yao, Yurong Chen. NIPS 2016, https://arxiv.org/abs/1608.04493. - Added SplicingPruner: A pruner that both prunes and splices connections. - Included an example schedule on ResNet20 CIFAR. - New features for compress_classifier.py: 1. Added the "--masks-sparsity" which, when enabled, logs the sparsity of the weight masks during training. 2. Added a new command-line argument to report the top N best accuracy scores, instead of just the highest score. This is sometimes useful when pruning a pre-trained model, that has the best Top1 accuracy in the first few pruning epochs. - New features for PruningPolicy: 1. The pruning policy can use two copies of the weights: one is used during the forward-pass, the other during the backward pass. This is controlled by the “mask_on_forward_only” argument. 2. If we enable “mask_on_forward_only”, we probably want to permanently apply the mask at some point (usually once the pruning phase is done). This is controlled by the “keep_mask” argument. 3. We introduce a first implementation of scheduling at the training-iteration granularity (i.e. at the mini-batch granularity). Until now we could schedule pruning at the epoch-granularity. This is controlled by the “mini_batch_pruning_frequency” (disable by setting to zero). Some of the abstractions may have leaked from PruningPolicy to CompressionScheduler. Need to reexamine this in the future.
-
- Nov 04, 2018
-
-
Neta Zmora authored
-
- Nov 02, 2018
-
-
Neta Zmora authored
Changed the description of the feature-set. Updating the README a little bit since a lot has changed since we released 6 months ago. Still a lot to add/remove/change.
-
- Nov 01, 2018
-
-
Neta Zmora authored
-
Guy Jacob authored
* Added command line arguments for this and other post-training quantization settings in image classification sample.
-
- Oct 31, 2018
-
-
Neta Zmora authored
Small improvement in the results
-
- Oct 29, 2018
-
-
Neta Zmora authored
fix issue #65: Missing link in the README file
-
Neta Zmora authored
This short notebook performs a forward-pass on a single ImageNet image, and saves the intermediate results to a file for later inspection
-
Neta Zmora authored
A notebook to visualize the data dependencies in non-trivial networks such as ResNet. This is meant to help in planning filter/channel pruning of such networks.
-
- Oct 27, 2018
-
-
Neta Zmora authored
-
- Oct 26, 2018
-
-
Neta Zmora authored
-
- Oct 25, 2018
-
-
Neta Zmora authored
After commit f396c34a362731c765370d368877c2ca367ad651, we Now always apply a pruning mask at the end of mini-batches (this is because we understand that weights may be updated by SGD+momentum, even for weights that are masked. Therefore, there is no need for masking the gradients: we always mask the weights at the end of the mini-batch. See issue #53 for more details.
-
- Oct 23, 2018
-
-
Neta Zmora authored
-
Neta Zmora authored
-
Neta Zmora authored
This notebook provides visualizations of ResNet50 models pay attention to when they classify. In other words, what are the models looking at. This information can be used for localization, but here it is provided merely to build our intuition. Based on: B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. CVPR'16 (arXiv:1512.04150, 2015). https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/
-
Neta Zmora authored
This is a simple application of Truncated SVD, just to get a feeling of what happens to the accuracy if we use TruncatedSVD w/o fine-tuning. We apply Truncated SVD on the linear layer found at the end of ResNet50, and run a test over the validation dataset to measure the impact on the classification accuracy.
-
Neta Zmora authored
The problem reported in issue #60 occurs when a user downloads an archive of Distiller instead of performing "git clone". When downloading an archive there is no ".git" directory and we need to handle this gracefully. For more details, see the issue itself.
-