Today, we dive into an integral piece of the deep learning puzzle: training your neural network. In this lesson, we will demystify what training entails and learn how to implement it using TensorFlow
. By training the model, the neural network learns from the input data, gradually adjusting its parameters (weights and biases) to minimize the error in its predictions.
Training a neural network is akin to teaching a child to recognize shapes. The child learns from repeated exposure and feedback, just as a neural network learns from training datasets. In practice, the training process involves several rounds of forwarding input data through the network, calculating the error (the difference between the network's output and the actual, desired output), and adjusting the weights and biases to minimize this error. This process is much like a child adjusting their understanding of shapes based on feedback!
This iterative method allows the neural network to learn independently from the data and can eventually lead to accurate predictions or classifications, thereby enabling us to create powerful and predictive models.
The model.fit()
method in TensorFlow
is our main tool for training a neural network. This method takes in inputs and their corresponding target values, fitting the model to this data over a certain number of iterations known as epochs
. Here are the key parameters we need to understand:
X
: Input data. This is the data from which your model will learn.y
: Target data. These are the answers or results that your model should learn to predict.epochs
: One epoch is one complete pass through the entire training dataset.batch_size
: This is the number of samples per gradient update. It's akin to breaking our dataset into smaller chunks, updating our model's learning parameters after each chunk.validation_split
: This value (between 0 and 1) determines the fraction of your training data that should be set aside for validation. Validation data guides the training process by providing a measure of model performance on unseen data.
Let's see this method in action with some code:
Python1from sklearn.datasets import load_digits 2from sklearn.model_selection import train_test_split 3from keras.utils import to_categorical 4from keras.models import Sequential 5from keras.layers import Dense 6 7# Load data 8digits = load_digits() 9X = digits.data 10y = digits.target 11 12# Convert to one-hot encoding 13y = to_categorical(y) 14 15# Split the data into training and test sets 16X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) 17 18# Create model 19model = Sequential() 20model.add(Dense(64, input_dim=len(X[0]), activation='relu')) 21model.add(Dense(len(y[0]), activation='softmax')) 22 23# Compile model 24model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 25 26# Train the model 27history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2)
Understanding how well the training is progressing is key when training a model. This is where the 'history' object from model.fit()
comes into play. This object contains information about the training and validation accuracy and loss for each epoch, which we can use to track the learning progress of our network.
We can extract this data and visualize them using plots. The following code creates a simple line plot for the training and validation accuracy for each epoch, which gives us insight into how effectively our model is learning.
Python1import matplotlib.pyplot as plt 2 3# Plot the training history 4plt.plot(history.history['accuracy'], label='accuracy') # Plotting training accuracy 5plt.plot(history.history['val_accuracy'], label = 'val_accuracy') # Plotting validation accuracy 6plt.xlabel('Epoch') # Label for x-axis 7plt.ylabel('Accuracy') # Label for y-axis 8plt.ylim([0, 1]) # Setting limit for y-axis 9plt.legend(loc='lower right') # Positioning legend 10plt.show() # Displaying the plot
ouput:
1 1/36 [..............................] - ETA: 0s - loss: 0.4314 - accuracy: 0.8125 213/36 [=========>....................] - ETA: 0s - loss: 0.3962 - accuracy: 0.8774 325/36 [===================>..........] - ETA: 0s - loss: 0.3419 - accuracy: 0.8875 436/36 [==============================] - 0s 7ms/step - loss: 0.3615 - accuracy: 0.8842 - val_loss: 0.3589 - val_accuracy: 0.8785
In the graph, you can see the accuracy (both 'training' and 'validation') plotted against the number of epochs. An epoch is one round of passing all samples through the model. This visualization provides a view into the progression of learning and can guide the adjustments needed in the learning parameters if necessary.
You can notice that with each passing epoch our accuracy
measure (metric against the training set) and our val_accuracy
(metric against the validation set, i.e. unseen data) gets better but we are still below 0.9 on both. How can we increase it? Given the trend we are seeing, increasing the number of epochs should be the right approach.
When we double the number of epochs we get the following code:
Python1from sklearn.datasets import load_digits 2from sklearn.model_selection import train_test_split 3from keras.utils import to_categorical 4from keras.models import Sequential 5from keras.layers import Dense 6 7# Load data 8digits = load_digits() 9X = digits.data 10y = digits.target 11 12# Convert to one-hot encoding 13y = to_categorical(y) 14 15# Split the data into training and test sets 16X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) 17 18# Create model 19model = Sequential() 20model.add(Dense(64, input_dim=len(X[0]), activation='relu')) 21model.add(Dense(len(y[0]), activation='softmax')) 22 23# Compile model 24model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 25 26# Train the model 27history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2) 28 29import matplotlib.pyplot as plt 30 31# Plot the training history 32plt.plot(history.history['accuracy'], label='accuracy') # Plotting training accuracy 33plt.plot(history.history['val_accuracy'], label = 'val_accuracy') # Plotting validation accuracy 34plt.xlabel('Epoch') # Label for x-axis 35plt.ylabel('Accuracy') # Label for y-axis 36plt.ylim([0, 1]) # Setting limit for y-axis 37plt.legend(loc='lower right') # Positioning legend 38plt.show() # Displaying the plot
output
1 1/36 [..............................] - ETA: 0s - loss: 0.0531 - accuracy: 1.0000 214/36 [==========>...................] - ETA: 0s - loss: 0.1044 - accuracy: 0.9621 325/36 [===================>..........] - ETA: 0s - loss: 0.1212 - accuracy: 0.9625 436/36 [==============================] - ETA: 0s - loss: 0.1192 - accuracy: 0.9669 536/36 [==============================] - 0s 7ms/step - loss: 0.1192 - accuracy: 0.9669 - val_loss: 0.1665 - val_accuracy: 0.9514
As you can see we now have above 0.95
accuracy on both training and cross-validation data and simply increasing the number of epochs is unlikely to change anything since our val_accuracy
curve has mostly flatlined. This metric is already excellent so we don't need to tweak the model much more, however, if we needed to, one of the first things to try would be to tweak the network architecture. For example, you could add more layers or adjust the width of the layers.
You have wonderfully navigated your way through the intricacies of training and evaluating a neural network in this lesson. You've grasped the vital role of training a neural network and understood how it can be performed and visualized using TensorFlow
.
We have a few more exercises for you to complete and you'll be done with this course. Let's do this!