@@ -8,7 +8,7 @@ For any of the methods below that require quantization-aware training, please se
...
@@ -8,7 +8,7 @@ For any of the methods below that require quantization-aware training, please se
Let's break down the terminology we use here:
Let's break down the terminology we use here:
-**Linear:** Means a float value is quantized by multiplying with a numeric constant (the **scale factor**).
-**Linear:** Means a float value is quantized by multiplying with a numeric constant (the **scale factor**).
-**Range-Based:**: Means that in order to calculate the scale factor, we look at the actual range of the tensor's values. In the most naive implementation, we use the actual min/max values of the tensor. Alternatively, we use some derivation based on the tensor's range / distribution to come up with a narrower min/max range, in order to remove possible outliers. This is in contrast to the other methods described here, which we could call **clipping-based**, as they impose an explicit clipping function on the tensors (using either a hard-coded value or a learned value).
-**Range-Based:** Means that in order to calculate the scale factor, we look at the actual range of the tensor's values. In the most naive implementation, we use the actual min/max values of the tensor. Alternatively, we use some derivation based on the tensor's range / distribution to come up with a narrower min/max range, in order to remove possible outliers. This is in contrast to the other methods described here, which we could call **clipping-based**, as they impose an explicit clipping function on the tensors (using either a hard-coded value or a learned value).
### Asymmetric vs. Symmetric
### Asymmetric vs. Symmetric
...
@@ -154,4 +154,4 @@ This method requires training the model with quantization-aware training, as dis
...
@@ -154,4 +154,4 @@ This method requires training the model with quantization-aware training, as dis
### Notes:
### Notes:
- The paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of `WRPNQuantizer` at the moment. To experiment with this, modify your model implementation to have wider layers.
- The paper proposed widening of layers as a means to reduce accuracy loss. This isn't implemented as part of `WRPNQuantizer` at the moment. To experiment with this, modify your model implementation to have wider layers.
- The paper defines special handling for binary weights which isn't supported in Distiller yet.
- The paper defines special handling for binary weights which isn't supported in Distiller yet.