Contenido del Curso
PyTorch Essentials
PyTorch Essentials
Training of the Model
Preparing for Training
First, you need to ensure that the model, loss function, and optimizer are properly defined. Let's go through each step:
- Loss function: for classification, you can use
CrossEntropyLoss
, which expects raw continuous values (logits) as input and automatically appliessoftmax
; - Optimizer: you can use the Adam optimizer for efficient gradient updates.
In PyTorch, cross-entropy loss combines log-softmax and negative log-likelihood (NLL) loss into a single loss function:
where:
- zy is the logit corresponding to the correct class;
- C is the total number of classes.
It is also important to split the data into training and validation sets (ideally, a separate test set should also exist). Since the dataset is relatively small (1143 rows), we use an 80% to 20% split. In this case, the validation set will also serve as the test set.
Moreover, the resulting NumPy arrays should be converted to tensors, as PyTorch models require tensor inputs for computations.
Training Loop
The training loop involves the following steps for each epoch:
- Forward pass: pass the input features through the model to generate predictions;
- Loss calculation: compare the predictions with the ground truth using the loss function;
- Backward pass: compute gradients with respect to the model parameters using backpropagation;
- Parameter update: adjust model parameters using the optimizer;
- Monitoring progress: print the loss periodically to observe convergence.
As you can see, the training process is similar to that of linear regression.
import torch.nn as nn import torch import torch.optim as optim import matplotlib.pyplot as plt import os os.system('wget https://staging-content-media-cdn.codefinity.com/courses/1dd2b0f6-6ec0-40e6-a570-ed0ac2209666/section_3/model_definition.py 2>/dev/null') from model_definition import model, X, y from sklearn.model_selection import train_test_split # Set manual seed for reproducibility torch.manual_seed(42) # Reinitialize model after setting seed model.apply(lambda m: m.reset_parameters() if hasattr(m, "reset_parameters") else None) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) X_train = torch.tensor(X_train, dtype=torch.float32) X_test = torch.tensor(X_test, dtype=torch.float32) y_train = torch.tensor(y_train, dtype=torch.long) y_test = torch.tensor(y_test, dtype=torch.long) # Define the loss function (Cross-Entropy for multi-class classification) criterion = nn.CrossEntropyLoss() # Define the optimizer (Adam with a learning rate of 0.01) optimizer = optim.Adam(model.parameters(), lr=0.01) # Number of epochs epochs = 100 # Store losses for plotting training_losses = [] # Training loop for epoch in range(epochs): # Zero out gradients from the previous step optimizer.zero_grad() # Compute predictions predictions = model(X_train) # Compute the loss loss = criterion(predictions, y_train) # Compute gradients loss.backward() # Update parameters optimizer.step() # Store the loss training_losses.append(loss.item()) # Plot the training loss plt.plot(range(epochs), training_losses, label="Training Loss") plt.xlabel("Epoch") plt.ylabel("Loss") plt.title("Training Loss over Epochs") plt.legend() plt.show()
Observing Convergence
In addition to training the model, we also record the training loss at each epoch and plot it over time. As shown in the graph, the training loss initially decreases rapidly and then gradually stabilizes around epoch 60. Beyond this point, the loss decreases at a much slower rate, suggesting that the model has likely converged. Therefore, using around 40 epochs for this model would be sufficient.
¡Gracias por tus comentarios!