-
- Downloads
Knowledge distillation fixes (#503)
Fixed two long-standing bugs in knowledge distillation: * Distillation loss needs to be scaled by T^2 (#122) * Use tensor.clone instead of new_tensor when caching student logits (#234) Updated example results and uploaded the script to generate them
Showing
- README.md 3 additions, 3 deletionsREADME.md
- distiller/knowledge_distillation.py 13 additions, 7 deletionsdistiller/knowledge_distillation.py
- examples/README.md 7 additions, 7 deletionsexamples/README.md
- examples/quantization/fp32_baselines/preact_resnet_cifar_base_fp32.yaml 93 additions, 50 deletions...ization/fp32_baselines/preact_resnet_cifar_base_fp32.yaml
- examples/quantization/preact_resnet_cifar_quant_distill_tests.sh 72 additions, 0 deletions...s/quantization/preact_resnet_cifar_quant_distill_tests.sh
- examples/quantization/quant_aware_train/preact_resnet_cifar_dorefa.yaml 102 additions, 49 deletions...ization/quant_aware_train/preact_resnet_cifar_dorefa.yaml
Loading
Please register or sign in to comment