Skip to content
Snippets Groups Projects
algo_quantization.md 5.02 KiB

Quantization Algorithms

The following quantization methods are currently implemented in Distiller:

DoReFa

(As proposed in DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients)

In this method, we first define the quantization function (quantize_k), which takes a real value (a_f \in [0, 1]) and outputs a discrete-valued (a_q \in \left{ \frac{0}{2^k-1}, \frac{1}{2^k-1}, ... , \frac{2^k-1}{2^k-1} \right}), where (k) is the number of bits used for quantization.

[a_q = quantize_k(a_f) = \frac{1}{2^k-1} round \left( \left(2^k - 1 \right) a_f \right)]

Activations are clipped to the ([0, 1]) range and then quantized as follows:

[x_q = quantize_k(x_f)]

For weights, we define the following function (f), which takes an unbounded real valued input and outputs a real value in ([0, 1]):

[f(w) = \frac{tanh(w)}{2 max(|tanh(w)|)} + \frac{1}{2} ]

Now we can use (quantize_k) to get quantized weight values, as follows:

[w_q = 2 quantize_k \left( f(w_f) \right) - 1]

This method requires training the model with quantization, as discussed here. Use the DorefaQuantizer class to transform an existing model to a model suitable for training with quantization using DoReFa.

Notes:

  • Gradients quantization as proposed in the paper is not supported yet.
  • The paper defines special handling for binary weights which isn't supported in Distiller yet.

WRPN

(As proposed in WRPN: Wide Reduced-Precision Networks)

In this method, activations are clipped to ([0, 1]) and quantized as follows ((k) is the number of bits used for quantization):

[x_q = \frac{1}{2^k-1} round \left( \left(2^k - 1 \right) x_f \right)]

Weights are clipped to ([-1, 1]) and quantized as follows: