Lesson 2
Basic Gradient Descent
Lesson Introduction

Welcome to the lesson on Basic Gradient Descent! In this lesson, we will learn about an optimization algorithm called gradient descent. It helps us find the minimum value of a function, which is very important in fields such as machine learning and data science.

Imagine you're on a hike in the mountains. You want to find the lowest point in a valley. Gradient descent is like taking small steps downhill until you reach the bottom. By the end of this lesson, you'll understand what gradient descent is, how it works, and how to implement it in Python.

Let's get started!

Understanding Gradient Descent

First, let's talk about what gradient descent is. Gradient descent is an algorithm used to minimize a function. It works similarly to the Newton's method. This is important for us, because in machine learning we usually optimize a multivariable loss function which measures the algorithm's error.

In mathematical terms, we want to minimize a function f(x,y)f(x, y) by iteratively moving in the direction of the negative gradient (the steepest descent) of the function. The gradient is a vector of partial derivatives that point in the direction of the steepest ascent. By taking steps against the gradient, we move towards the minimum value.

Imagine you’re playing a game where you need to find the treasure buried at the lowest point of a landscape. Gradient descent helps you figure out which direction to dig by looking at the slope of the land around you.

Example: Simple Quadratic Function

To understand gradient descent better, let's use a simple example:

f(x,y)=x2+y2f(x, y) = x^2 + y^2

Here, f(x,y)f(x, y) represents a bowl-shaped surface, and we want to find the lowest point or the minimum value.

Think of this function as a stretched rubber sheet. If you press down at any point, the steepest increase in height is indicated by the gradient, and moving in the opposite direction will take you towards the bottom of the sheet.

Gradient Calculation

To apply gradient descent, we need to calculate the gradient of the function. For the quadratic function f(x,y)=x2+y2f(x, y) = x^2 + y^2, the gradient is:

f(x,y)=[2x,2y]\nabla f(x, y) = [2x, 2y]

The gradient tells us how the function changes at a given point.

Imagine rolling a ball down a hill. The direction in which the ball accelerates the fastest is the gradient direction. The steeper the slope, the larger the gradient.

Plotting the Target Function

Let's visualize our quadratic function using a contour plot in Python.

Implementing Gradient Descent: Part 1

Now, let's see how to implement a basic version of gradient descent in Python. Let's see the formula first:

xt+1=xtηf(xt)\mathbf{x}_{t+1} = \mathbf{x}_t - \eta \nabla f(\mathbf{x}_t)

where:

  • xt\mathbf{x}_t is the current point.
  • η\eta is the learning rate.
  • f(xt)\nabla f(\mathbf{x}_t) is the gradient at the current point.

We'll start by initializing a point and updating it by moving against the gradient.

Python
1# Gradient descent without path output 2def simple_gradient_descent(f_grad, init_point, learning_rate=0.1, iterations=100): 3 point = list(init_point) 4 for _ in range(iterations): 5 grad = f_grad(point) 6 point = [p - learning_rate * g for p, g in zip(point, grad)] 7 return point

Here are the inputs of the simple_gradient_descent function:

  • f_grad: The gradient function of the target function.
  • init_point: The starting point for the gradient descent algorithm.
  • learning_rate: A hyperparameter that controls the step size in each iteration.
  • iterations: The number of iterations to update the point.
Choosing the Starting Point and Learning Rate

The starting point is our initial guess of where the function's minimum might be. In our example, we start from [2, 2].

The learning rate is critical for the algorithm's performance:

  • A high learning rate might cause overshooting and divergence.
  • A low learning rate will make convergence slow.

Typically, you experiment with these values to find the best combination.

Implementing Gradient Descent: Part 2

Next, we will define the function we want to minimize and its gradient.

Python
1def quadratic_function(point): 2 x, y = point 3 return x**2 + y**2 4 5def quadratic_gradient(point): 6 x, y = point 7 return [2 * x, 2 * y]

Now, we run the gradient descent algorithm.

Python
1optimal_point = simple_gradient_descent(quadratic_gradient, [2, 2], learning_rate=0.1, iterations=100) 2print("Optimal point after basic gradient descent:", optimal_point) # Optimal point after basic gradient descent: close to [0, 0]
Code Explanation

In simple_gradient_descent:

  • We take in the gradient function, initial point, learning rate, and number of iterations.
  • We initialize the point and update it using the gradient and learning rate.
  • After iterating, we return the optimal point.

For the example functions:

  • quadratic_function computes x2+y2x^2 + y^2.
  • quadratic_gradient calculates the gradient of this function.
Visualizing the Gradient Descent Path

Finally, let's visualize the gradient descent path. We will mark each gradient step as a red point on the plot, starting with [2,2][2, 2]. To make the plot clear, let's limit the gradient descent to 10 iterations.

You can observe how the gradient moves towards the minimum point. As we approach the minimum point, the function's steepness decreases, so we take smaller steps. If we take more iterations, we will find this minimum point efficiently, as demonstrated earlier.

Lesson Summary

Great job! You've learned about gradient descent, its importance in minimizing functions, and how to implement it in Python. You now understand how we can find the minimum value of a function by iteratively moving against the gradient.

Just like finding the lowest point in a valley by taking steps downhill, gradient descent helps us find the minimum value of complex functions, essential in optimization problems.

Next, you'll practice by coding your version of gradient descent and applying it to various functions. Dive into the practice tasks and solidify your understanding of using gradient descent for optimization tasks!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.