Hello! In this lesson, we'll thoroughly examine the inner workings of the crucial backpropagation algorithm in training neural networks and create it from scratch in Python.
A neural network consists of an input layer, one or more hidden layers, and an output layer. Each layer houses neurons, or nodes interconnected through links attributed with weights. These weights and bias terms dictate the network's output. In our Python code, the size of the input layer adjusts according to the shape of self.input
. The hidden layer hosts four neurons (self.weights1
), and the output layer accommodates one neuron (self.weights2
).
Our activation function, the sigmoid function, transforms real-value numbers into a range between 0 and 1. Let's recall its mathematical definition:
The derivative of the sigmoid function plays an essential role in backpropagation for weight updates. It is represented as:
These functions are implemented in Python as sigmoid(x)
and sigmoid_derivative(x)
.
Python1def sigmoid(x): 2 return 1.0/(1+ np.exp(-x)) 3 4def sigmoid_derivative(x): 5 return x * (1.0 - x)
The following methods will be defined in a class initialized like this:
Python1class NeuralNetwork: 2 def __init__(self, x, y, learning_rate=0.1): 3 self.input = x 4 self.weights1 = np.random.rand(self.input.shape[1],4) 5 self.weights2 = np.random.rand(4,1) 6 self.y = y 7 self.output = np.zeros(self.y.shape) 8 self.learning_rate = learning_rate
The self.weights1
and self.weights2
parameters here refer to the weights of the connections from the input layer to the first hidden layer and from the first hidden layer to the output layer, respectively.
The self.y
stores the target data in the instance.
The self.output
creates a Numpy array filled with zeroes to hold the neural network's output.
Feedforward propagation involves data moving from the input layer to the output layer, passing through the hidden layers. The inputs and corresponding weights multiply, and the resultant values are processed through the activation function (the sigmoid function, in this scenario).
Python1def feedforward(self): 2 # Implements feedforward method using dot product and sigmoid function 3 self.layer1 = sigmoid(np.dot(self.input, self.weights1)) 4 self.output = sigmoid(np.dot(self.layer1, self.weights2))
Backpropagation is crucial to the learning process of neural networks. It corrects the network's error by propagating the error from the output layer back to the input layer, adjusting the weights to minimize the discrepancy between predicted (self.output
) and actual outputs (self.y
). This process is mathematically presented as:
Where:
In the backprop
function, the error indicates the requirement for weight adjustments.
In the backprop
function, the derivatives of weights (d_weights2
and d_weights1
) are computed using the error and the derivatives of the neuron outputs. Afterward, the weights are updated based on these derivatives.
Python1def backprop(self): 2 # Performs backpropagation and updates weights 3 d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output))) 4 d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1))) 5 6 self.weights1 += self.learning_rate * d_weights1 7 self.weights2 += self.learning_rate * d_weights2
An epoch signifies a complete pass through the entire training dataset. The train
function applies backpropagation repeatedly over several epochs to adjust weights and minimize error. Multiple epochs provide the model with numerous opportunities to learn and correct its errors until it finds the most optimal weights for predictions.
Python1def train(self, epochs): 2 # Repeatedly performs feedforward and backpropagation for several epochs 3 for epoch in range(epochs): 4 self.feedforward() 5 self.backprop()
Now we can define the predict
method.
Python1def predict(self, new_input): 2 layer1 = sigmoid(np.dot(new_input, self.weights1)) 3 output = sigmoid(np.dot(layer1, self.weights2)) 4 return output
The predict
function computes outputs for given inputs by propagating inputs through layers with dot product operations and using the sigmoid function as the activation function. It is very similar to the feedforward
function.
Let's implement these concepts for the XOR (exclusive OR) problem. In this problem, accurate results depend on the parity of the number of true inputs. We initialize our neural network with inputs X
, corresponding outputs Y
, and train it over 1500 epochs. The weights adjust accordingly, enabling the correct prediction of the XOR problem.
Python1X = np.array([[0, 0, 1], 2 [0, 1, 1], 3 [1, 0, 1], 4 [1, 1, 1]]) 5Y = np.array([[0], [1], [1], [0]]) 6nn = NeuralNetwork(X, Y) 7 8nn.train(10000) 9print("\nPredictions:") 10for i, x in enumerate(X): 11 print(f"Input: {x} ---> Prediction: {nn.predict(np.array([x]))}, Expected: {Y[i]}")
Congratulations! You've dissected the fundamental backpropagation algorithm, comprehended the mathematics underpinning it, and manifested it from scratch in Python. Implement, learn, and observe the transformations they bring about in your neural network output. Keep exploring and enjoy your voyage through deep learning!