-
- Downloads
ModelSummary: adapt sparsity accounting to correctly account for "weight tying"wq
In language models, we might use use "weight tying", which means that the same weights tensor is used in several different places. If tying is used, we'd like to log the tensor information, but exclude it from the total sparsity calculation.
Please register or sign in to comment