Quantization Algorithms
The following quantization methods are currently implemented in Distiller:
DoReFa
(As proposed in DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients)
In this method, we first define the quantization function (quantize_k), which takes a real value (a_f \in [0, 1]) and outputs a discrete-valued (a_q \in \left{ \frac{0}{2^k-1}, \frac{1}{2^k-1}, ... , \frac{2^k-1}{2^k-1} \right}), where (k) is the number of bits used for quantization.
[a_q = quantize_k(a_f) = \frac{1}{2^k-1} round \left( \left(2^k - 1 \right) a_f \right)]
Activations are clipped to the ([0, 1]) range and then quantized as follows:
[x_q = quantize_k(x_f)]
For weights, we define the following function (f), which takes an unbounded real valued input and outputs a real value in ([0, 1]):
[f(w) = \frac{tanh(w)}{2 max(|tanh(w)|)} + \frac{1}{2} ]
Now we can use (quantize_k) to get quantized weight values, as follows:
[w_q = 2 quantize_k \left( f(w_f) \right) - 1]
This method requires training the model with quantization, as discussed here. Use the DorefaQuantizer
class to transform an existing model to a model suitable for training with quantization using DoReFa.
Notes:
- Gradients quantization as proposed in the paper is not supported yet.
- The paper defines special handling for binary weights which isn't supported in Distiller yet.
WRPN
(As proposed in WRPN: Wide Reduced-Precision Networks)
In this method, activations are clipped to ([0, 1]) and quantized as follows ((k) is the number of bits used for quantization):
[x_q = \frac{1}{2^k-1} round \left( \left(2^k - 1 \right) x_f \right)]
Weights are clipped to ([-1, 1]) and quantized as follows: