Backpropagation Implementation
General Approach
In forward propagation, each layer l takes the outputs from the previous layer, alβ1, as inputs and computes its own outputs. Therefore, the forward() method of the Layer class takes the vector of previous outputs as its only parameter, while the rest of the needed information is stored within the class.
In backward propagation, each layer l only needs dal to compute the respective gradients and return dalβ1, so the backward() method takes the dal vector as its parameter. The rest of the required information is already stored in the Layer class.
Activation Function Derivatives
Since the derivatives of activation functions are required for backpropagation, activation functions such as ReLU and sigmoid should be implemented as classes rather than standalone functions. This structure makes it possible to define both components clearly:
- The activation function itself β implemented using the
__call__()method, so it can be applied directly in theLayerclass withself.activation(z); - Its derivative β implemented using the
derivative()method, allowing efficient computation during backpropagation viaself.activation.derivative(z).
Representing activation functions as objects makes it easy to pass them to different layers and apply them dynamically during both forward and backward propagation.
ReLu
The derivative of ReLU activation function is as follows, where ziβ is an element of vector of pre-activations z:
fβ²(ziβ)={1,ziβ>00,ziββ€0βclass ReLU:
def __call__(self, z):
return np.maximum(0, z)
def derivative(self, z):
return (z > 0).astype(float)
Sigmoid
The derivative of sigmoid activation function is as follows:
fβ²(ziβ)=f(ziβ)β (1βf(ziβ))class Sigmoid:
def __call__(self, x):
return 1 / (1 + np.exp(-z))
def derivative(self, z):
sig = self(z)
return sig * (1 - sig)
For both activation functions, the operation is applied to the entire vector z, as well as to its derivative. NumPy automatically performs the computation element-wise, meaning each element of the vector is processed independently.
For example, if the vector z contains three elements, the derivative is computed as:
fβ²(z)=fβ²ββz1βz2βz3ββββ=βfβ²(z1β)fβ²(z2β)fβ²(z3β)ββThe backward() Method
The backward() method is responsible for computing the gradients using the formulas below:
a^{l-1} and zl are stored as the inputs and outputs attributes in the Layer class, respectively. The activation function f is stores as the activation attribute.
Once all the required gradients are computed, the weights and biases can be updated since they are no longer needed for further computation:
Wlblβ=WlβΞ±β dWl=blβΞ±β dblβTherefore, learning_rate (Ξ±) is another parameter of this method.
def backward(self, da, learning_rate):
dz = ...
d_weights = ...
d_biases = ...
da_prev = ...
self.weights -= learning_rate * d_weights
self.biases -= learning_rate * d_biases
return da_prev
The * operator performs element-wise multiplication, while the np.dot() function performs dot product in NumPy. The .T attribute transposes an array.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4
Backpropagation Implementation
Swipe to show menu
General Approach
In forward propagation, each layer l takes the outputs from the previous layer, alβ1, as inputs and computes its own outputs. Therefore, the forward() method of the Layer class takes the vector of previous outputs as its only parameter, while the rest of the needed information is stored within the class.
In backward propagation, each layer l only needs dal to compute the respective gradients and return dalβ1, so the backward() method takes the dal vector as its parameter. The rest of the required information is already stored in the Layer class.
Activation Function Derivatives
Since the derivatives of activation functions are required for backpropagation, activation functions such as ReLU and sigmoid should be implemented as classes rather than standalone functions. This structure makes it possible to define both components clearly:
- The activation function itself β implemented using the
__call__()method, so it can be applied directly in theLayerclass withself.activation(z); - Its derivative β implemented using the
derivative()method, allowing efficient computation during backpropagation viaself.activation.derivative(z).
Representing activation functions as objects makes it easy to pass them to different layers and apply them dynamically during both forward and backward propagation.
ReLu
The derivative of ReLU activation function is as follows, where ziβ is an element of vector of pre-activations z:
fβ²(ziβ)={1,ziβ>00,ziββ€0βclass ReLU:
def __call__(self, z):
return np.maximum(0, z)
def derivative(self, z):
return (z > 0).astype(float)
Sigmoid
The derivative of sigmoid activation function is as follows:
fβ²(ziβ)=f(ziβ)β (1βf(ziβ))class Sigmoid:
def __call__(self, x):
return 1 / (1 + np.exp(-z))
def derivative(self, z):
sig = self(z)
return sig * (1 - sig)
For both activation functions, the operation is applied to the entire vector z, as well as to its derivative. NumPy automatically performs the computation element-wise, meaning each element of the vector is processed independently.
For example, if the vector z contains three elements, the derivative is computed as:
fβ²(z)=fβ²ββz1βz2βz3ββββ=βfβ²(z1β)fβ²(z2β)fβ²(z3β)ββThe backward() Method
The backward() method is responsible for computing the gradients using the formulas below:
a^{l-1} and zl are stored as the inputs and outputs attributes in the Layer class, respectively. The activation function f is stores as the activation attribute.
Once all the required gradients are computed, the weights and biases can be updated since they are no longer needed for further computation:
Wlblβ=WlβΞ±β dWl=blβΞ±β dblβTherefore, learning_rate (Ξ±) is another parameter of this method.
def backward(self, da, learning_rate):
dz = ...
d_weights = ...
d_biases = ...
da_prev = ...
self.weights -= learning_rate * d_weights
self.biases -= learning_rate * d_biases
return da_prev
The * operator performs element-wise multiplication, while the np.dot() function performs dot product in NumPy. The .T attribute transposes an array.
Thanks for your feedback!