Lesson 4

Welcome to our new lesson on **Logistic Regression** and its implementation using the **Gradient Descent** technique. Having familiarized yourself with the fundamentals of Regression Analysis and the operation of Gradient Descent in optimizing regression models, we'll now address a different kind of problem: Classification. While Regression Analysis is suitable for predicting continuous variables, when predicting categories such as whether an email is spam or not, we need specially designed tools — one of them being Logistic Regression.

In this lesson, we'll guide you through the basic concepts that define Logistic Regression, focusing on its unique components like the *Sigmoid function* and *Log-Likelihood*. Eventually, we'll utilize Python to engineer a straightforward Logistic Regression model using Gradient Descent. By the end of this lesson, you will have broadened your theoretical understanding of another vital machine learning concept and enhanced your practical Python coding skills.

So far, we've dealt with tasks where a continuous output needs prediction based on one or more input variables - these tasks are known as regression tasks. There is, however, another category of tasks known as classification tasks, where the objective is to predict a categorical outcome. These categories are often binary, like "spam"/"not spam" for an email or "malignant"/"benign" for a tumor. The models we've studied so far are not optimal for predicting categorical outcomes - for example, it isn't intuitive to understand what it means for an email to be "0.67" spam. Enter **Logistic Regression** - a classification algorithm that can predict the probability of a binary outcome.

While Linear Regression makes predictions by directly calculating the output, Logistic Regression does it differently. Instead of directly predicting the output, Logistic Regression calculates a raw model output, then transforms it using the sigmoid function, mapping it to a range between 0 and 1, thus making it a probability.

The Sigmoid function is defined as $S(x) = \frac{1} {1+e^{-x}}$

We can implement it like this:

Python`1def sigmoid(z): 2 return 1 / (1 + np.exp(-z))`

It looks like this:

When providing a high positive input, the output of $S(x)$ is close to 1, and for a large negative input, the output is close to 0. This feature of the Sigmoid function makes it a perfect fit when we want to classify emails into two categories: "spam" or "not-spam".

The mathematical form of Logistic Regression can be expressed as follows:

$P(Y=1| x) = \frac{1} {1+e^{-( β_0+β_1x)}}$

Where:

- $P(Y=1| x)$ is the probability of event
`Y=1`

given`x`

. - $β_0$ and $β_1$ are parameters of the model.
- $x$ is the input variable.
- $β_0 + β_1x$ is the linear combination of parameters and feature(s).

*Log-Likelihood* in Logistic Regression plays a similar role to the *Least Squares method* in Linear Regression. A maximum likelihood estimation method estimates parameters that maximize the likelihood of making the observations we collected. In Logistic Regression, we seek to maximize the log-likelihood.

We've seen the least squares cost function in Linear Regression. However, in Logistic Regression, the cost function is defined differently.

The cost function for a single training instance can be expressed as:

$-[ylog(\hat{p}) + (1-y)log(1-\hat{p})]$

Where $\hat{p}$ denotes the predicted probability.

We can implement it like this:

Python`1def cost_function(h, y): 2 return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()`

Let's plot it:

This function makes sense because $−log(t)$ approaches $0$ when $t$ approaches $1$, so the cost will be close to $0$ if the predicted probability is near the actual target. However, the cost will approach $\inf$ when $t$ approaches $0$, which coincides with predicting a probability close to $0$ for a positive instance will be highly penalized. This peculiar feature of the cost function gives rise to another concern, the threshold selection. You might wonder why we often consider a probability of more than 0.5 as belonging to Category 1, and less than 0.5 as Category 0. This is simply a convention for binary classification and can be adjusted based on the problem at hand.

As we already know, the Gradient Descent technique is highly efficient at finding the global minimum of a function. Logistic regression is used to calculate the values of parameters that result in the smallest cost. Here's a simple Python implementation of a Logistic Regression model:

Python`1def logistic_regression(X, y, num_iterations, learning_rate): 2 # Add intercept to X 3 intercept = np.ones((X.shape[0], 1)) 4 X = np.concatenate((intercept, X), axis=1) 5 6 # Weights initialization 7 theta = np.zeros(X.shape[1]) 8 9 for i in range(num_iterations): 10 z = np.dot(X, theta) 11 h = sigmoid(z) 12 gradient = np.dot(X.T, (h - y)) / y.size 13 theta -= learning_rate * gradient 14 15 z = np.dot(X, theta) 16 h = sigmoid(z) 17 loss = cost_function(h, y) 18 19 if i % 10000 == 0: 20 print(f'Loss: {loss}\t') 21 22 return theta`

In this code:

- The
`sigmoid()`

function computes the sigmoid of the input value. - The
`cost_function()`

computes the cost for given inputs and outputs using the weights. - The
`logistic_regression()`

applies Gradient Descent to Logistic Regression to find the optimum weights for minimizing the cost.

This simple function can be a Logistic Regression model for classifying emails as "spam" or "not-spam".

Now, we can define the `predict`

function, which makes the prediction:

Python`1def predict_prob(X, theta): 2 # Add intercept to X 3 intercept = np.ones((X.shape[0], 1)) 4 X = np.concatenate((intercept, X), axis=1) 5 return sigmoid(np.dot(X, theta)) 6 7def predict(X, theta, threshold=0.5): 8 return predict_prob(X, theta) >= threshold`

That wraps up our lesson on the fundamentals of Logistic Regression and its Python implementation using Gradient Descent. Throughout this lesson, we've highlighted the differences between regression and classification tasks, introduced Logistic Regression as a classification algorithm, and elaborated on the components that define it.

You'll have ample opportunities to refine these skills in our forthcoming practice exercises. Remember, the more you practice, the more fluent you'll become. So, practice away and have fun doing it!