Welcome back! Today, we'll explore Gradient Descent with Momentum. You've already familiarized yourself with basic gradient descent. However, sometimes gradient descent is slow and gets stuck, especially on bumpy paths to the minimum.
So, how do we speed it up? We use momentum. Imagine pushing a heavy shopping cart. Instead of stopping and starting, you build momentum. This helps you move faster. By the end of this lesson, you'll understand how gradient descent with momentum works, implement it in Python, and see how it improves optimization.
Momentum in optimization is like a push in physical movement. It reduces oscillations (back-and-forth movements) as you head toward the minimum point. Essentially, it helps you move down faster and more steadily.
Let's look at the formulas for momentum:
Here's what these terms mean:
In the basic gradient descent, the update to the parameters is directly proportional to the gradient of the function. However, this approach can be slow due to the oscillations around the minimum.
With momentum, the velocity term is introduced to accelerate gradient vectors in the right directions. Here's a detailed breakdown of how the velocity term works:
In essence, the velocity term accumulates the gradients of past steps to create a more stable and faster approach toward the minimum.
Now, let's implement Gradient Descent with Momentum in Python. Here's the code snippet:
Python1# Gradient descent with momentum 2def gradient_descent_with_momentum(f_grad, init_point, learning_rate=0.1, momentum=0.9, iterations=100): 3 point = list(init_point) 4 velocity = [0] * len(point) 5 for _ in range(iterations): 6 grad = f_grad(point) 7 for i in range(len(point)): 8 velocity[i] = momentum * velocity[i] - learning_rate * grad[i] 9 point[i] += velocity[i] 10 return point
Here’s a breakdown of the key lines in the code:
velocity = [0] * len(point)
: Initializes the velocity vector with zeros, having the same length as the starting point.velocity[i] = momentum * velocity[i] - learning_rate * grad[i]
: Updates the velocity by applying the momentum and subtracting the gradient scaled by the learning rate.point[i] += velocity[i]
: Updates the current point using the newly calculated velocity.Here's the continuation of our implementation with the example function and initial point:
Python1# Example function: f(x, y) = x^2 + y^2 2def quadratic_gradient(point): 3 x, y = point 4 return [2 * x, 2 * y] 5 6# Initial point 7init_point = [2, 2] 8 9# Find the optimal point using gradient descent with momentum 10optimal_point_momentum = gradient_descent_with_momentum(quadratic_gradient, init_point, learning_rate=0.1, iterations=100) 11print("Optimal point after gradient descent with momentum:", optimal_point_momentum) # Optimal point after gradient descent with momentum: [~0, ~0]
Using momentum in gradient descent offers several benefits:
To visually understand this, let's compare the paths of basic gradient descent and gradient descent with momentum. We will take and apply both types of gradient with just 3
iterations:
You can clearly see how the gradient descent with momentum converges faster.
Congratulations! You've learned about Gradient Descent with Momentum. We covered its importance, how it works, and implemented it in Python. You've seen how it speeds up optimization and reduces oscillations.
Now, let’s practice. In the practice session, you'll implement Gradient Descent with Momentum and observe its effects on different functions. Get ready to solidify your understanding and see momentum in action!