Aprende Model Training | Neural Network from Scratch

Training a neural network involves an iterative process in which the model gradually improves by adjusting its weights and biases to minimize the loss function. This process is known as gradient-based optimization, and it follows a structured algorithm.

General Algorithm

The dataset is first passed through the network multiple times in a loop, where each complete pass is referred to as an epoch. During each epoch, the data is shuffled to prevent the model from learning patterns based on the order of the training examples. Shuffling helps introduce randomness, leading to a more robust model.

For each training example, the model performs forward propagation, where inputs pass through the network, layer by layer, producing an output. This output is then compared to the actual target value to compute the loss.

Next, the model applies backpropagation and updates the weights and biases in each layer to reduces the loss.

This process repeats for multiple epochs, allowing the network to refine its parameters gradually. As training progresses, the network learns to make increasingly accurate predictions. However, careful tuning of hyperparameters such as the learning rate is crucial to ensure stable and efficient training.

The learning rate (η) determines the step size in weight updates. If it is too high, the model might overshoot the optimal values and fail to converge. If it is too low, training becomes slow and might get stuck in a suboptimal solution. Choosing an appropriate learning rate balances speed and stability in training. Typical values range from 0.001 to 0.1, depending on the problem and network size.

The graph below shows how an appropriate learning rate enables the loss to decrease steadily at an optimal pace:

Finally, stochastic gradient descent (SGD) plays a vital role in training efficiency. Instead of computing weight updates after processing the entire dataset, SGD updates the parameters after each individual example. This makes training faster and introduces slight variations in updates, which can help the model escape local minima and reach a better overall solution.

The fit() Method

The fit() method in the Perceptron class is responsible for training the model using stochastic gradient descent.

def fit(self, training_data, labels, epochs, learning_rate):
    # Iterating over multiple epochs
    for epoch in range(epochs):
        # Shuffling the data  
        indices = np.random.permutation(training_data.shape[0])
        training_data = training_data[indices]
        labels = labels[indices]
        # Iterating through each training example
        for i in range(training_data.shape[0]):
            inputs = training_data[i, :].reshape(-1, 1)
            target = labels[i, :].reshape(-1, 1)

            # Forward propagation
            output = ...

            # Computing the gradient of the loss function w.r.t. output
            da = ...

            # Backward propagation through all layers
            for layer in self.layers[::-1]:
                da = ...

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 9

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla