Welcome to our new lesson on Logistic Regression and its implementation using the Gradient Descent technique. Having familiarized yourself with the fundamentals of Regression Analysis and the operation of Gradient Descent in optimizing regression models, we'll now address a different kind of problem: Classification. While Regression Analysis is suitable for predicting continuous variables, when predicting categories such as whether an email is spam or not, we need specially designed tools — one of them being Logistic Regression.
In this lesson, we'll guide you through the basic concepts that define Logistic Regression, focusing on its unique components like the Sigmoid function and Log-Likelihood. Eventually, we'll utilize Python to engineer a straightforward Logistic Regression model using Gradient Descent. By the end of this lesson, you will have broadened your theoretical understanding of another vital machine learning concept and enhanced your practical Python coding skills.
So far, we've dealt with tasks where a continuous output needs prediction based on one or more input variables - these tasks are known as regression tasks. There is, however, another category of tasks known as classification tasks, where the objective is to predict a categorical outcome. These categories are often binary, like "spam"/"not spam" for an email or "malignant"/"benign" for a tumor. The models we've studied so far are not optimal for predicting categorical outcomes - for example, it isn't intuitive to understand what it means for an email to be "0.67" spam. Enter Logistic Regression - a classification algorithm that can predict the probability of a binary outcome.
While Linear Regression makes predictions by directly calculating the output, Logistic Regression does it differently. Instead of directly predicting the output, Logistic Regression calculates a raw model output, then transforms it using the sigmoid function, mapping it to a range between 0 and 1, thus making it a probability.
The Sigmoid function is defined as
We can implement it like this:
Python1def sigmoid(z): 2 return 1 / (1 + np.exp(-z))
It looks like this:
When providing a high positive input, the output of is close to 1, and for a large negative input, the output is close to 0. This feature of the Sigmoid function makes it a perfect fit when we want to classify emails into two categories: "spam" or "not-spam".
The mathematical form of Logistic Regression can be expressed as follows:
Where:
- is the probability of event
Y=1
givenx
. - and are parameters of the model.
- is the input variable.
- is the linear combination of parameters and feature(s).
Log-Likelihood in Logistic Regression plays a similar role to the Least Squares method in Linear Regression. A maximum likelihood estimation method estimates parameters that maximize the likelihood of making the observations we collected. In Logistic Regression, we seek to maximize the log-likelihood.
We've seen the least squares cost function in Linear Regression. However, in Logistic Regression, the cost function is defined differently.
The cost function for a single training instance can be expressed as:
Where denotes the predicted probability.
We can implement it like this:
Python1def cost_function(h, y): 2 return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
Let's plot it:
This function makes sense because approaches when approaches , so the cost will be close to if the predicted probability is near the actual target. However, the cost will approach when approaches , which coincides with predicting a probability close to for a positive instance will be highly penalized. This peculiar feature of the cost function gives rise to another concern, the threshold selection. You might wonder why we often consider a probability of more than 0.5 as belonging to Category 1, and less than 0.5 as Category 0. This is simply a convention for binary classification and can be adjusted based on the problem at hand.
As we already know, the Gradient Descent technique is highly efficient at finding the global minimum of a function. Logistic regression is used to calculate the values of parameters that result in the smallest cost. Here's a simple Python implementation of a Logistic Regression model:
Python1def logistic_regression(X, y, num_iterations, learning_rate): 2 # Add intercept to X 3 intercept = np.ones((X.shape[0], 1)) 4 X = np.concatenate((intercept, X), axis=1) 5 6 # Weights initialization 7 theta = np.zeros(X.shape[1]) 8 9 for i in range(num_iterations): 10 z = np.dot(X, theta) 11 h = sigmoid(z) 12 gradient = np.dot(X.T, (h - y)) / y.size 13 theta -= learning_rate * gradient 14 15 z = np.dot(X, theta) 16 h = sigmoid(z) 17 loss = cost_function(h, y) 18 19 if i % 10000 == 0: 20 print(f'Loss: {loss}\t') 21 22 return theta
In this code:
- The
sigmoid()
function computes the sigmoid of the input value. - The
cost_function()
computes the cost for given inputs and outputs using the weights. - The
logistic_regression()
applies Gradient Descent to Logistic Regression to find the optimum weights for minimizing the cost.
This simple function can be a Logistic Regression model for classifying emails as "spam" or "not-spam".
Now, we can define the predict
function, which makes the prediction:
Python1def predict_prob(X, theta): 2 # Add intercept to X 3 intercept = np.ones((X.shape[0], 1)) 4 X = np.concatenate((intercept, X), axis=1) 5 return sigmoid(np.dot(X, theta)) 6 7def predict(X, theta, threshold=0.5): 8 return predict_prob(X, theta) >= threshold
That wraps up our lesson on the fundamentals of Logistic Regression and its Python implementation using Gradient Descent. Throughout this lesson, we've highlighted the differences between regression and classification tasks, introduced Logistic Regression as a classification algorithm, and elaborated on the components that define it.
You'll have ample opportunities to refine these skills in our forthcoming practice exercises. Remember, the more you practice, the more fluent you'll become. So, practice away and have fun doing it!