Building a Multi-Class Classification Model with PyTorch

Lesson 2

Lesson Overview

Excellent work on preprocessing the Wine dataset! Now, it's time to leverage those efforts by building a multi-class classification model using PyTorch. In this lesson, we will walk you through the entire process—from loading the preprocessed data to defining and training our model. We'll also explore the concepts of loss functions and optimizers, crucial for improving our model's performance. Let's dive in and build our classification model step-by-step!

Loading the Preprocessed Dataset

Before we start building our multi-class classification model, we need to load our preprocessed dataset. To maintain modular code, we use the load_preprocessed_data function from our previous lesson, stored in data_preprocessing.py. This function handles loading, splitting, scaling, and converting the Wine dataset into PyTorch tensors, providing the data in a format that is ready to train our model.

Load the preprocessed dataset:

Python
1from data_preprocessing import load_preprocessed_data
2
3X_train, X_test, y_train, y_test = load_preprocessed_data()

Here's a brief recap of the data_preprocessing.py:

Python
1import torch
2from sklearn.datasets import load_wine
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler
5
6def load_preprocessed_data():
7    # Load the Wine dataset
8    wine = load_wine()
9    X, y = wine.data, wine.target
10
11    # Split the dataset into training and testing sets
12    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y)
13
14    # Scale the features
15    scaler = StandardScaler().fit(X_train)
16    X_train_scaled = scaler.transform(X_train)
17    X_test_scaled = scaler.transform(X_test)
18
19    # Convert to PyTorch tensors
20    X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
21    X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
22    y_train_tensor = torch.tensor(y_train, dtype=torch.long)
23    y_test_tensor = torch.tensor(y_test, dtype=torch.long)
24
25    return X_train_tensor, X_test_tensor, y_train_tensor, y_test_tensor

With the dataset loaded, we're ready to build our multi-class classification model.

Building a Multi-Class Neural Network with PyTorch

A machine learning model in PyTorch is generally composed of layers. Our model is a multi-class feed-forward neural network consisting of linear and non-linear layers, where we have three linear layers and two ReLU (Rectified Linear Unit) activation layers. This model structure is defined using nn.Sequential. The nn.Linear function represents a linear transformation and nn.ReLU introduces non-linearity to the model, which is essential for learning complex patterns.

Let's break the following code down:

Python
1import torch
2import torch.nn as nn
3
4# Define the model using nn.Sequential
5model = nn.Sequential(
6    nn.Linear(13, 10),
7    nn.ReLU(),
8    nn.Linear(10, 10),
9    nn.ReLU(),
10    nn.Linear(10, 3)
11)
12
13# Display model's architecture
14print(model)

We begin with an input layer nn.Linear(13, 10), which accepts input tensors of size 13 (the number of features in our dataset) and outputs tensors of size 10. Next, a ReLU activation layer applies an element-wise non-linear transformation. This process is repeated for another layer. Finally, we have an output layer nn.Linear(10, 3), which takes an input of size 10 and returns an output of size 3, corresponding to our three wine classes.

The output of our model will be a tensor of raw scores (logits) for each of the three classes. These logits can be converted to probabilities by applying a softmax function, which we'll explain is automatically handled by our loss function later. The model's output represents the model's confidence in each class.

The model's architecture displayed from the above code will be:

Plain text
1Sequential(
2  (0): Linear(in_features=13, out_features=10, bias=True)
3  (1): ReLU()
4  (2): Linear(in_features=10, out_features=10, bias=True)
5  (3): ReLU()
6  (4): Linear(in_features=10, out_features=3, bias=True)
7)

This output describes the neural network architecture we defined. It shows each layer in sequence, including the type (Linear or ReLU), the number of input and output features for Linear layers, and whether a bias term is included.

Defining the Loss Function

Now that we've defined our model structure, we need to specify how to measure its performance.

The performance is determined by the loss function, which calculates the disparity between the model's predictions and the actual labels. Simply put, the loss function tells us how wrong our model is. For our classification problem, the suitable loss function is Cross-Entropy loss. Here's how you define it in PyTorch:

Python
1criterion = nn.CrossEntropyLoss()

Cross-Entropy loss is used when the output needs to be a probability distribution. This means it helps measure how different the predicted probabilities are from the actual labels. Because of the way CrossEntropyLoss works, it expects raw scores (also known as logits) as input and it internally applies the softmax function to convert these scores into probabilities. So, even though softmax is typically used for multi-class classification problems, we don't need to add a separate softmax layer in our model; the loss function handles it for us. This simplifies our model design and ensures numerical stability.

Defining the Optimizer

To improve the performance of our model, we need to update the model parameters (weights and biases). This is where an optimizer comes in. The optimizer helps to adjust these parameters to reduce the loss. The optimizer we use in this case is Adam. Adam, short for Adaptive Moment Estimation, is an algorithm that adjusts the model's parameters based on the gradients (which tell us how much the loss would change if we changed the parameters). Here's how you set it up in PyTorch:

Python
1import torch.optim as optim
2
3optimizer = optim.Adam(model.parameters(), lr=0.001)

The Adam optimizer includes settings like the learning rate (lr), which controls how big a step we take while updating the parameters of the model. The model.parameters() call returns an iterator of all the parameters (weights and biases) in our model that need to be optimized. These parameters are updated during the training process to minimize the loss. A learning rate of 0.001 is generally a good starting point for many problems. This balance ensures that we steadily move towards a solution without making too large of a step and potentially missing the optimal parameters.

By defining the loss function and the optimizer, we lay the groundwork for our model to learn and improve through training.

Training the Model

After defining the model and its evaluation metrics, our next step is to train the model using our training data.

Python
1# Train the model
2num_epochs = 150
3history = {'loss': [], 'val_loss': []}
4for epoch in range(num_epochs):
5    model.train()
6    optimizer.zero_grad()
7    outputs = model(X_train)
8    loss = criterion(outputs, y_train)  
9    loss.backward()
10    optimizer.step()
11    history['loss'].append(loss.item())
12    
13    model.eval()
14    with torch.no_grad():
15        outputs_val = model(X_test)
16        val_loss = criterion(outputs_val, y_test)  
17        history['val_loss'].append(val_loss.item())
18
19    if (epoch+1) % 10 == 0:
20        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Validation Loss: {val_loss.item():.4f}')

Here's a step-by-step breakdown of the code:

Set Number of Epochs: Define how many times the model will iterate over the entire training dataset.
Initialize History: Create a dictionary to store loss and validation loss values for each epoch.
Training Loop: Loop through the training process for the specified number of epochs.
Training Phase: Switch to training mode, clear gradients, make predictions, calculate loss, perform backpropagation, update parameters, and store training loss.
Evaluation Phase: Switch to evaluation mode, disable gradient calculation, make predictions on validation data (outputs_val), which is not used to train the model but to evaluate how it performs on unseen data, calculate validation loss, and store validation loss.
Print Progress: Every 10 epochs, print the current epoch, training loss, and validation loss to monitor progress.

The output of our complete training process might look like this:

Plain text
1Epoch [10/150], Loss: 1.1324, Validation Loss: 1.1123
2Epoch [20/150], Loss: 1.1020, Validation Loss: 1.0844
3Epoch [30/150], Loss: 1.0713, Validation Loss: 1.0547
4Epoch [40/150], Loss: 1.0366, Validation Loss: 1.0204
5...
6Epoch [150/150], Loss: 0.3014, Validation Loss: 0.3216

Lesson Summary

Congratulations on reaching the end of this lesson! You have learned how to construct a neural network model in PyTorch, from defining the model structure to training it. You mastered the concept of how the loss function and optimizer work in tandem to reduce the model's error, how they are defined, and used.

To reinforce what you've learned, your next exercises will involve building the model and training process for our Wine dataset. This will give you the practical experience necessary to tackle real-world machine learning problems using PyTorch confidently. Enjoy practicing to make perfect!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.