Gradient Descent Optimization in Linear Regression

Lesson 3

Gradient Descent Optimization in Linear Regression

Introduction

Hello and welcome to another session on "Regression and Gradient Descent." In today's syllabus, we will construct and fit the gradient descent algorithm into a linear regression problem. Though linear regression does have a direct solution, gradient descent is essential for computational efficiency, especially when handling larger datasets or complex models.

The Concept of Gradient Descent

Gradient descent is an iterative optimization algorithm for minimizing a function, usually a loss function, quantifying the disparity between predicted and actual results. The goal of gradient descent is to find the parameters that minimize the value of the loss function. Importantly, gradient descent navigates its way to the minimum of the function by moving iteratively toward the direction of the steepest descent. However, to leverage gradient descent, the target function must be differentiable.

Taking Steps with Gradient Descent

Gradient descent derives its name from its working mechanism: taking descents along the gradient. It operates in several iterative steps as follows:

Choose random values for initial parameters.
Calculate the cost (the difference between actual and predicted value).
Compute the gradient (the steepest slope of the function around that point).
Update the parameters using the gradient.
Repeat steps 2 to 4 until we reach an acceptable error rate or exhaust the maximum iterations.

A vital component of gradient descent is the learning rate, which determines the size of the descent towards the optimum solution. It is important to note that if the learning rate is too high, we may overshoot the minimum, and if it's too low, the convergence to the minimum may take too long.

Implementing Gradient Descent in Python: The Cost Function

Let's implement it from scratch with a basic understanding of the gradient descent algorithm. We will need two functions: one for calculating the cost and another for calculating and applying the gradient to update our parameters. Moreover, we'll add an early stop mechanism that will halt computations after a predefined number of iterations.

The cost function is as follows:

J(X, y, \theta) = \frac{1}{m} \sum_{i=1}^m (X\cdot\theta - y_i)^2

where $J$ is the cost, $X$ is the data, $y$ is the actual values, $\theta$ is the parameters, and $m$ is the length of $y$ . It is the calculation of the mean square error.

Python
1import numpy as np
2
3def cost(X, y, theta):
4    m = len(y)
5    predictions = X.dot(theta)
6    cost = (1/m) * np.sum(np.square(predictions-y)) # Compute mean square error
7    return cost

Implementing Gradient Descent in Python: The Gradient Descent

Next, for the gradient descent function, we follow the gradient descent update rule:

\theta := \theta - \alpha \frac{1}{m} X^T \cdot (X\cdot\theta - y)

Here, $\alpha$ is the learning rate, which determines the size of our steps in the descent. $X^T$ is the transpose of the data. Note, that it should have been multiplied by 2, as we take the derivative of a the mean squared error, but we can ignore this 2 and just consider it as a part of the learning rate coefficient.

Python
1def gradient_descent(X, y, theta, alpha, iterations):
2    m = len(y)
3    cost_history = np.zeros(iterations)
4    theta_history = np.zeros((iterations,2))
5    for i in range(iterations): # Iterate until convergence
6        prediction = np.dot(X,theta)  # Matrix multiplication between X and theta
7        theta = theta - (1/m)*alpha*(X.T.dot((prediction - y))) # Gradient update rule
8        theta_history[i,:] = theta.T
9        cost_history[i] = cost(X,y,theta)
10    return theta, cost_history, theta_history

Applying Gradient Descent to Linear Regression

Let's apply our gradient descent function to a simple linear regression problem. The form of linear regression is:

y = ax + b

where $a$ and $b$ are the parameters, $\theta$ , we need to learn. The following data have been generated based on this form with some noise.

Python
1X = 2 * np.random.rand(100,1)
2y = 4 +3 * X+np.random.randn(100,1)
3
4lr = 0.01 # Learning Rate
5n_iter = 1000 # Max number of iterations
6theta = np.random.randn(2,1) # Randomly initialized parameters
7X_b = np.c_[np.ones((len(X),1)),X] # add bias parameter to X
8theta, cost_history, theta_history = gradient_descent(X_b,y,theta,lr,n_iter) # Gradient Descent

Lesson Summary and Practice

Congratulations! You have mastered implementing the gradient descent algorithm and its application to linear regression. We covered theoretical explanations, derived the math behind the cost function and the gradient descent update rule, and brought these concepts to life by coding in Python.

It is now time to practice and solidify what you have learned. In the upcoming exercises, challenge yourself with different problems and experiment with varying parameters like the learning rate. Enjoy your journey into the world of gradients!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.

AI Theory and Coding

Regression and Gradient DescentLesson 3

Lesson 3