-
Guy Jacob authored
Fixed two long-standing bugs in knowledge distillation: * Distillation loss needs to be scaled by T^2 (#122) * Use tensor.clone instead of new_tensor when caching student logits (#234) Updated example results and uploaded the script to generate them
Guy Jacob authoredFixed two long-standing bugs in knowledge distillation: * Distillation loss needs to be scaled by T^2 (#122) * Use tensor.clone instead of new_tensor when caching student logits (#234) Updated example results and uploaded the script to generate them