Unverified Commit 32a7e4bf authored 5 years ago by Guy Jacob Committed by GitHub 5 years ago

Knowledge distillation fixes (#503)

Fixed two long-standing bugs in knowledge distillation:
 * Distillation loss needs to be scaled by T^2 (#122)
 * Use tensor.clone instead of new_tensor when caching student logits (#234)
Updated example results and uploaded the script to generate them

parent 8c5b287c

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 290 additions and 116 deletions

Please register or to comment