Compiling and Training Neural Networks with TensorFlow

Lesson 3

Lesson Introduction

Hello again! By now, you should be familiar with building a Neural Network model's architecture in TensorFlow, so let's move on to finally compiling and training a neural network. In this lesson we'll use TensorFlow to compile our model with the Adam optimizer, Binary Crossentropy loss, and Accuracy metric. Then, we'll train the model using the fit() function. By the end of this lesson, you will understand how to compile and train a Neural Network model in TensorFlow.

Recap: Building the Neural Network Model

Before we dive into compiling and training, let's quickly recap how we can build a neural network model with TensorFlow. Our task will be to predict whether a student will pass or fail based on two input features: the number of hours studied and the number of hours slept. To accomplish this, we define a simple neural network model. Here's the code snippet to illustrate our model architecture:

Python
1import tensorflow as tf
2
3# Define the model with 2 inputs (hours studied, hours slept) and 1 output (pass/fail)
4model = tf.keras.Sequential([
5    tf.keras.layers.Input(shape=(2,)), 
6    tf.keras.layers.Dense(5, activation='relu'), 
7    tf.keras.layers.Dense(1, activation='sigmoid') 
8])

Input Layer: We specify an input shape of (2,) since we have two input features (hours studied and hours slept).
Hidden Layer: The model has one hidden layer with 5 neurons and uses the ReLU activation function, which helps the model learn complex relationships in the data.
Output Layer: The output layer has 1 neuron with a sigmoid activation function to predict the binary outcome (0 for fail or 1 for pass).

With our model architecture finalized, we are now ready to move on to compiling and training the neural network.

Neural Network Model Compilation

After defining the neural network model's structure as seen in our previous lessons, the next step is to compile the model. The "compile" step in TensorFlow specifies the optimizer, loss function, and other parameters needed before we can train the model.

This is how we compile our model:

Python
1model.compile(optimizer='adam', 
2              loss='binary_crossentropy', 
3              metrics=['accuracy'])

Now, let's delve into what each of these parameters means.

Optimizer

In the code example, the compile() method takes three arguments, the first being the optimizer. This is the optimization algorithm used to update the model's parameters. We're using Adam, an algorithm that's popular due to its computationally efficient and robust nature; other optimizers like SGD, RMSprop, and Adagrad could also be used.

Loss

The second argument to the compile() method is the loss function. This is the function that the model aims to minimize during training. We're using the Binary Crossentropy loss function, which is commonly used for binary classification problems. It takes the model's predictions ( $\hat{y}_i$ ) and actual label ( $y_i$ ) for each instance ( $i$ ) in the dataset, and calculates the average log loss over the total number of instances ( $N$ ). The formula for Binary Crossentropy is:

\text{Binary Crossentropy} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]

Metrics

The final argument to the compile() method is the metrics. These denote the metrics to be evaluated by the model during training and testing. In this model, we're using accuracy as a metric. Accuracy calculates the ratio of correctly predicted instances to the total instances in the dataset. The formula for Accuracy is:

\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}

Training the Model

After compiling the model, the next step is training it on our data, which includes hours studied and hours slept as input features, and labels indicating whether a student passed or not. For this, we use the fit() function in TensorFlow. This function adjusts the model parameters to minimize the loss over several iterations (or epochs). Each epoch is one complete pass through the entire training dataset.

In real-life scenarios, datasets are often large and complex, requiring many training iterations for the model to learn its patterns, but for demonstration purposes, we are using a simple dataset and only 10 iterations:

Python
1import numpy as np
2
3# Example data: hours studied, hours slept
4X = np.array([
5    [4, 6], [5, 7], [2, 8], [1, 3], [3, 4], [0, 5],
6    [1, 1], [2, 4], [3, 5], [5, 5], [0, 4], [4, 4],
7])
8
9# Labels: 1 if passed, 0 if failed
10y = np.array([[1], [1], [1], [0], [0], [0], [0], [0], [1], [1], [0], [1]])
11
12model.fit(X, y, epochs=10)

X is our input data (the hours studied and slept).
y is our labels (whether each student passed or not).
epochs=10 means we train over 10 complete passes through the training dataset.

TensorFlow will automatically convert these numpy arrays (X and y) into tensors internally during the training process.

The output of the above code will be:

Plain text
1Epoch 1/10
21/1 [=========================] - 2s 2s/step - accuracy: 0.5000 - loss: 2.5336
3Epoch 2/10
41/1 [=========================] - 0s 39ms/step - accuracy: 0.5000 - loss: 2.5243
5...
6Epoch 10/10
71/1 [=========================] - 0s 40ms/step - accuracy: 0.5000 - loss: 2.4508

This shows the model training process over 10 epochs, with each epoch showing the time taken per step, the accuracy, and the loss. The model starts with an accuracy of 0.5000 and a loss of 2.5336 and ends the 10th epoch with the same accuracy but a slightly reduced loss of 2.4508. It indicates that the model is learning and optimizing the loss function, albeit the accuracy remains constant in this particular run, suggesting further adjustments or more complex training data might be needed to improve the model's predictive accuracy.

Lesson Summary and Practice

Great job! You've learned how to compile and train a neural network in TensorFlow. In particular, you've touched on the Adam optimizer's theory, Binary Crossentropy loss, accuracy metric, and the mechanics of the model.fit() function in TensorFlow.

Now that you have these basics down, you should feel confident to compile and train other types of neural network models. Remember that the choice of optimizer, loss function, and metrics can significantly affect how well a model performs, so it's crucial to understand their implications and how to use them.

Practice exercises are up next – they will reinforce your understanding and offer some hands-on experience. Happy learning!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.