Learn Activation vs Weight Quantization | Error Propagation in Deep Models

Swipe to show menu

Quantization in neural networks can target either the weights or the activations of each layer. The quantization error introduced in each case has distinct mathematical properties and implications for model performance.

For weights, let the original weight value be $w$ and the quantized value be $Q(w)$ . The quantization error is then:

\varepsilon_w = Q(w) - w

Assuming uniform quantization with step size $Δ_w$ , and that weights are distributed uniformly within the quantization interval, the mean squared quantization error per weight is:

E[\varepsilon_w^2] = \frac{\Delta_w^2}{12}

For activations, let the original activation be $a$ and the quantized activation be $Q(a)$ . The quantization error is:

\varepsilon_a = Q(a) - a

If activations are quantized with step size $Δ_a$ , the mean squared quantization error per activation is similarly:

E[\varepsilon_a^2] = \frac{\Delta_a^2}{12}

However, the distribution and dynamic range of activations can vary significantly between layers and even between inputs, making the choice of $Δ_a$ more complex than $Δ_w$ . Weights are typically fixed after training, so their range is static and easier to analyze, while activations depend on both the input data and the network's internal transformations.

When quantizing activations, clipping and saturation become important considerations. Clipping occurs when an activation value exceeds the representable range of the quantizer, and is forcibly set to the maximum or minimum allowed value. This can result in information loss if significant portions of the activation distribution are clipped. Saturation refers to the repeated mapping of many input values to the same quantized output, reducing the effective resolution.

In the context of nonlinear activation functions such as ReLU, these effects interact with quantization in complex ways. For instance, ReLU outputs are strictly non-negative, often with a heavy tail, which means a large proportion of activations may be close to zero, while a few may be very large. If the quantization range is not set appropriately, many activations may be clipped, or the quantization steps may be too coarse for small values, introducing large errors. Nonlinearities can also mask errors when small quantization noise is zeroed out by the nonlinearity, or amplify errors if they occur near the activation threshold.

Definition

The activation dynamic range is the interval between the minimum and maximum values that an activation can take in a neural network layer. This range is crucial for quantization, as it determines the quantizer's step size and affects how much of the activation distribution is subject to clipping or saturation. Choosing an appropriate dynamic range helps minimize quantization error and information loss.

Nonlinearities such as ReLU, sigmoid, or tanh can have a strong effect on how quantization errors propagate through a network. For instance, ReLU sets all negative values to zero, which means any quantization error that brings a value below zero will be completely masked. Conversely, if quantization noise pushes a value just above zero, it may be amplified in subsequent layers. Nonlinearities may also compress or expand the dynamic range of activations, affecting both the magnitude and distribution of quantization noise. This complex interplay means that quantization errors in activations may not simply accumulate linearly, but can be modified or distorted by the network's nonlinear structure.

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2