Lesson 2

Excellent work on preprocessing the Wine dataset! Now, it's time to leverage those efforts by building a **multi-class classification model using PyTorch**. In this lesson, we will walk you through the entire processâ€”from loading the preprocessed data to defining and training our model. We'll also explore the concepts of loss functions and optimizers, crucial for improving our model's performance. Let's dive in and build our classification model step-by-step!

Before we start building our multi-class classification model, we need to load our preprocessed dataset. To maintain modular code, we use the `load_preprocessed_data`

function from our previous lesson, stored in `data_preprocessing.py`

. This function handles loading, splitting, scaling, and converting the Wine dataset into PyTorch tensors, providing the data in a format that is ready to train our model.

Load the preprocessed dataset:

Python`1from data_preprocessing import load_preprocessed_data 2 3X_train, X_test, y_train, y_test = load_preprocessed_data()`

Here's a brief recap of the `data_preprocessing.py`

:

Python`1import torch 2from sklearn.datasets import load_wine 3from sklearn.model_selection import train_test_split 4from sklearn.preprocessing import StandardScaler 5 6def load_preprocessed_data(): 7 # Load the Wine dataset 8 wine = load_wine() 9 X, y = wine.data, wine.target 10 11 # Split the dataset into training and testing sets 12 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y) 13 14 # Scale the features 15 scaler = StandardScaler().fit(X_train) 16 X_train_scaled = scaler.transform(X_train) 17 X_test_scaled = scaler.transform(X_test) 18 19 # Convert to PyTorch tensors 20 X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32) 21 X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32) 22 y_train_tensor = torch.tensor(y_train, dtype=torch.long) 23 y_test_tensor = torch.tensor(y_test, dtype=torch.long) 24 25 return X_train_tensor, X_test_tensor, y_train_tensor, y_test_tensor`

With the dataset loaded, we're ready to build our multi-class classification model.

A machine learning model in PyTorch is generally composed of layers. Our model is a multi-class feed-forward neural network consisting of linear and non-linear layers, where we have three linear layers and two ReLU (Rectified Linear Unit) activation layers. This model structure is defined using `nn.Sequential`

. The `nn.Linear`

function represents a linear transformation and `nn.ReLU`

introduces non-linearity to the model, which is essential for learning complex patterns.

Let's break the following code down:

Python`1import torch 2import torch.nn as nn 3 4# Define the model using nn.Sequential 5model = nn.Sequential( 6 nn.Linear(13, 10), 7 nn.ReLU(), 8 nn.Linear(10, 10), 9 nn.ReLU(), 10 nn.Linear(10, 3) 11) 12 13# Display model's architecture 14print(model)`

We begin with an input layer `nn.Linear(13, 10)`

, which accepts input tensors of size 13 (the number of features in our dataset) and outputs tensors of size 10. Next, a ReLU activation layer applies an element-wise non-linear transformation. This process is repeated for another layer. Finally, we have an output layer `nn.Linear(10, 3)`

, which takes an input of size 10 and returns an output of size 3, corresponding to our three wine classes.

The output of our model will be a tensor of raw scores (logits) for each of the three classes. These logits can be converted to probabilities by applying a softmax function, which we'll explain is automatically handled by our loss function later. The model's output represents the model's confidence in each class.

The model's architecture displayed from the above code will be:

Plain text`1Sequential( 2 (0): Linear(in_features=13, out_features=10, bias=True) 3 (1): ReLU() 4 (2): Linear(in_features=10, out_features=10, bias=True) 5 (3): ReLU() 6 (4): Linear(in_features=10, out_features=3, bias=True) 7)`

This output describes the neural network architecture we defined. It shows each layer in sequence, including the type (Linear or ReLU), the number of input and output features for Linear layers, and whether a bias term is included.

Now that we've defined our model structure, we need to specify how to measure its performance.

The performance is determined by the loss function, which calculates the disparity between the model's predictions and the actual labels. Simply put, the loss function tells us how wrong our model is. For our classification problem, the suitable loss function is Cross-Entropy loss. Here's how you define it in PyTorch:

Python`1criterion = nn.CrossEntropyLoss()`

Cross-Entropy loss is used when the output needs to be a probability distribution. This means it helps measure how different the predicted probabilities are from the actual labels. Because of the way `CrossEntropyLoss`

works, it expects raw scores (also known as logits) as input and it internally applies the softmax function to convert these scores into probabilities. So, even though softmax is typically used for multi-class classification problems, we don't need to add a separate softmax layer in our model; the loss function handles it for us. This simplifies our model design and ensures numerical stability.

To improve the performance of our model, we need to update the model parameters (weights and biases). This is where an optimizer comes in. The optimizer helps to adjust these parameters to reduce the loss. The optimizer we use in this case is Adam. Adam, short for Adaptive Moment Estimation, is an algorithm that adjusts the model's parameters based on the gradients (which tell us how much the loss would change if we changed the parameters). Here's how you set it up in PyTorch:

Python`1import torch.optim as optim 2 3optimizer = optim.Adam(model.parameters(), lr=0.001)`

The `Adam`

optimizer includes settings like the learning rate (`lr`

), which controls how big a step we take while updating the parameters of the model. The `model.parameters()`

call returns an iterator of all the parameters (weights and biases) in our model that need to be optimized. These parameters are updated during the training process to minimize the loss. A learning rate of `0.001`

is generally a good starting point for many problems. This balance ensures that we steadily move towards a solution without making too large of a step and potentially missing the optimal parameters.

By defining the loss function and the optimizer, we lay the groundwork for our model to learn and improve through training.

After defining the model and its evaluation metrics, our next step is to train the model using our training data.

Python`1# Train the model 2num_epochs = 150 3history = {'loss': [], 'val_loss': []} 4for epoch in range(num_epochs): 5 model.train() 6 optimizer.zero_grad() 7 outputs = model(X_train) 8 loss = criterion(outputs, y_train) 9 loss.backward() 10 optimizer.step() 11 history['loss'].append(loss.item()) 12 13 model.eval() 14 with torch.no_grad(): 15 outputs_val = model(X_test) 16 val_loss = criterion(outputs_val, y_test) 17 history['val_loss'].append(val_loss.item()) 18 19 if (epoch+1) % 10 == 0: 20 print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Validation Loss: {val_loss.item():.4f}')`

Here's a step-by-step breakdown of the code:

**Set Number of Epochs**: Define how many times the model will iterate over the entire training dataset.**Initialize History**: Create a dictionary to store loss and validation loss values for each epoch.**Training Loop**: Loop through the training process for the specified number of epochs.**Training Phase**: Switch to training mode, clear gradients, make predictions, calculate loss, perform backpropagation, update parameters, and store training loss.**Evaluation Phase**: Switch to evaluation mode, disable gradient calculation, make predictions on validation data (`outputs_val`

), which is not used to train the model but to evaluate how it performs on unseen data, calculate validation loss, and store validation loss.**Print Progress**: Every 10 epochs, print the current epoch, training loss, and validation loss to monitor progress.

The output of our complete training process might look like this:

Plain text`1Epoch [10/150], Loss: 1.1324, Validation Loss: 1.1123 2Epoch [20/150], Loss: 1.1020, Validation Loss: 1.0844 3Epoch [30/150], Loss: 1.0713, Validation Loss: 1.0547 4Epoch [40/150], Loss: 1.0366, Validation Loss: 1.0204 5... 6Epoch [150/150], Loss: 0.3014, Validation Loss: 0.3216`

Congratulations on reaching the end of this lesson! You have learned how to construct a neural network model in PyTorch, from defining the model structure to training it. You mastered the concept of how the loss function and optimizer work in tandem to reduce the model's error, how they are defined, and used.

To reinforce what you've learned, your next exercises will involve building the model and training process for our Wine dataset. This will give you the practical experience necessary to tackle real-world machine learning problems using PyTorch confidently. Enjoy practicing to make perfect!