Lesson 4

Hello and welcome to our lesson on **"Understanding Gradient"!** In this lesson, we'll explore what gradients are, why they matter, and how to compute them in Python. This will help you understand how to optimize machine learning models. By the end, you'll know how to find the gradient of a multivariable function and understand its importance.

Let's start with the basics. Imagine a landscape of hills and valleys. The gradient tells you the direction of the steepest climb. Formally, the gradient is a vector that contains all the partial derivatives of a multivariable function, pointing in the direction where the function increases the most.

If you're climbing a hill and want to get to the top as quickly as possible, the gradient guides you on the steepest path.

Gradients are crucial in machine learning. When training a model, you want to find the best parameters that minimize a loss function. The gradient helps you know how to adjust these parameters to reduce error. Think of it as having a map that shows the quickest way to reach a lower point. For example, in adjusting weights in a neural network to improve predictions, the gradient shows how to change the weights.

Consider a simple function: $f(x, y) = x^2 + y^2$. It's often used to illustrate gradients due to its straightforward shape. Suppose we want to find out how to climb up or down this "bowl" starting from a point like (1, -1).

Partial derivatives are like small vectors showing the steepest direction for different variables. For $f(x, y) = x^2 + y^2$, the partial derivatives are:

- $\frac{\partial f}{\partial x} = 2x$
- $\frac{\partial f}{\partial y} = 2y$

At any point $(x, y)$, these partial derivatives point in the steepest direction for each variable. Combine these into a vector to get the gradient vector:

$\nabla f(x, y) = [2x, 2y]$

This vector shows the direction of the steepest increase of the function.

Let's visualize this concept by plotting the function and gradient vector at the point (1, -1).

This plot shows the function as a heatmap and the gradient vector at the point (1, -1) as a white arrow. This arrow points towards the steepest function's ascent, meaning the fastest increase.

Conversely, the negative gradient points to the steepest descent, useful for finding the minimum value of the function. When optimizing machine learning models, we often follow the negative gradient direction to minimize errors.

Now, let's see how to calculate the gradient using the central difference method, which takes the difference between values at slightly different points around the point of interest.

Here's how to do this in Python:

Python`1# Centrally differenced gradient for multivariable functions 2def gradient(f, pt, h=1e-5): 3 grad = [] 4 for i in range(len(pt)): 5 augmented_point = pt[:] 6 augmented_point[i] += h 7 grad.append((f(*augmented_point) - f(*pt)) / h) 8 return grad`

This function computes the gradient of a multivariable function `f`

at a given point `pt`

by approximating partial derivatives. It does so by iterating through each variable, slightly perturbing it, and measuring the change in the function's value. The results are collected in the `grad`

list and returned.

Let's break it down:

`grad = []`

: Initializes an empty list to store the partial derivatives.`for i in range(len(pt)):`

: Iterates over each variable in the point`pt`

. In our case, it is`x`

and`y`

, but it can be any amount of variables.`augmented_point = pt[:]`

: Creates a copy of`pt`

to modify without altering the original point.`augmented_point[i] += h`

: Adds a small value`h`

to the i-th coordinate of the copied point.`grad.append((f(*augmented_point) - f(*pt)) / h)`

: Computes the approximate partial derivative for the i-th coordinate and appends it to the`grad`

list. The star operator is used here to unpack a point into a list of inputs. It turns x, y list into two separate variables: x and y to pass them as inputs to the function f.`return grad`

: Returns the gradient vector as a list of partial derivatives.

This general approach allows to calculate the gradient of any function with any amount of variables.

Let's continue with an example function and see the output:

Python`1# Multivariable function example: f(x, y) = x^2 + y^2 2multivariable_fn = lambda x, y: x**2 + y**2 3 4print("Gradient of f(x, y) at (1, -1):", gradient(multivariable_fn, [1, -1])) # Gradient of f(x, y) at (1, -1): [2, -2] (will be a bit different due to the computational error)`

In this example, we define a simple quadratic function `f(x, y) = x^2 + y^2`

and calculate its gradient at the point `(1, -1)`

. The output is an approximation of the true gradient `[2, -2]`

, demonstrating how the central difference method works.

Great job! We've learned what gradients are, why we need them in machine learning, and how to calculate them using central differences. The gradient shows us where a function increases the most, helping us optimize models.

Now it's time to practice. You'll calculate gradients for different functions and explore their real-life applications. Let's dive into the exercises and solidify your understanding!