Skip to content
Snippets Groups Projects
user avatar
Guy Jacob authored
(we use 8-bit values below, but this applies to any bit-width)
* We use the notion of "full" and "restricted" quantized range for
  symmetric quantization (see section 2.2 in https://arxiv.org/abs/1806.08342)
* "Full" quantized range ==> [-128, 127], "restircted" ==> [-127, 127]
* Until now, when doing symmetric quantization we assumed a "full"
  range when saturating after quantization, but calculated the scale
  factor as if the range was restricted. This means we weren't making
  full utilization of the quantized range.
* On the other hand, in some other implementations of quantization (e.g.
  TensorFlow), the "restricted" range is used.
* So, we make it an option to use either the proper "full" range
  (q_min = -128) or "restricted" range (q_min = -127).
* LinearQuantMode.SYMMETRIC now means the "full" range is used, and
  added LinearQuantMode.SYMMETRIC_RESTRICTED for using the "restricted"
  range.
* Updated tests and documentation.
78255ee0
History