Hello and welcome! Today, we'll be diving into training a basic Gradient Boosting Model using financial data, specifically focusing on Tesla ($TSLA
) stock prices. By the end of this lesson, you will understand how to implement gradient boosting for predictive analysis in stock trading within a Python framework.
Let's go!
First, let's quickly revise how to load data and prepare it for machine learning:
Python1import pandas as pd 2from sklearn.preprocessing import StandardScaler 3import datasets 4 5# Load dataset 6tesla = datasets.load_dataset('codesignal/tsla-historic-prices') 7tesla_df = pd.DataFrame(tesla['train']) 8 9# Convert the column to `datetime` 10tesla_df['Date'] = pd.to_datetime(tesla_df['Date']) 11 12tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean() 13tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean() 14tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean() 15tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean() 16 17# Drop NaN values created by moving averages 18tesla_df.dropna(inplace=True) 19 20# Features and target selection 21features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values 22target = tesla_df['Adj Close'].values 23 24# Standardizing features 25scaler = StandardScaler() 26features_scaled = scaler.fit_transform(features)
In this code we:
- Convert the
'Date'
column todatetime
format. - Calculate SMA with windows of 5 and 10 days.
- Calculate EMA with spans of 5 and 10 days.
- Handle missing values resulting from moving averages.
- Select relevant features and the target variable.
- Standardize the feature values for better model performance.
Gradient Boosting is a powerful machine learning technique used for predictive modeling tasks. Gradient Boosting Regressor is a specific application of this technique for regression tasks, where we aim to predict a continuous target variable like stock prices.
In simple terms, Gradient Boosting works by creating an ensemble (a group) of weak prediction models, which are typically simple models. It combines these weak models in a sequential manner to build a robust predictive model. Here's a simplified explanation of how it works:
- Initial Prediction: Start with an initial prediction, which is often the average of the target values.
- Calculate Residuals: Calculate the residuals, which are the differences between the actual target values and the current predictions.
- Train Weak Learners: Train a weak learner (a simple model) on the residuals to predict these errors.
- Update Predictions: Update the overall predictions by adding the predictions of the weak learner to the current predictions.
- Iterate: Repeat steps 2-4 multiple times, each time using a new weak learner to correct the errors of the previous model.
Through this iterative process, the gradient boosting regressor minimizes the errors and produces a strong predictive model.
Now, let's move on to the core part of our lesson: training the Gradient Boosting Model.
First, we need to split the dataset into training and testing sets. Then, we instantiate a Gradient Boosting Regressor and fit the model to the training data.
Here is the necessary code to accomplish this:
Python1from sklearn.model_selection import train_test_split 2from sklearn.ensemble import GradientBoostingRegressor 3 4# Splitting the dataset 5X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.25, random_state=42) 6 7# Instantiate and fit the model 8model = GradientBoostingRegressor(random_state=42) 9model.fit(X_train, y_train)
Evaluating the model is crucial to understand how well it performs. We will:
- Make predictions with the trained model.
- Calculate and print the Mean Squared Error (MSE) to the actual
y_test
values.
Here is how you can achieve this:
Python1from sklearn.metrics import mean_squared_error 2 3# Predict and evaluate 4predictions = model.predict(X_test) 5mse = mean_squared_error(y_test, predictions) 6print("Mean Squared Error:", mse)
The output of the above code will be:
Plain text1Mean Squared Error: 0.4944244179351423
This output indicates the accuracy of our Gradient Boosting Model by providing the mean squared error between the actual and predicted stock prices. A lower MSE value suggests better predictive performance.
Finally, let's visualize the actual vs predicted values to understand the performance of our model better:
We will plot the actual and predicted values using scatter plots. Here's the visualization code:
Python1import matplotlib.pyplot as plt 2 3# Plotting predictions vs actual values 4plt.figure(figsize=(10, 6)) 5plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7) 6plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7) 7plt.title('Actual vs Predicted Values') 8plt.xlabel('Sample Index') 9plt.ylabel('Value') 10plt.legend() 11plt.show()
Here:
plt.figure(figsize=(10, 6))
: This line initializes a new figure with a specified size.plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
: This command creates a scatter plot of the actual values. Therange(len(y_test))
generates x-coordinates, whiley_test
provides actual stock prices. Thelabel
parameter is set for the legend, andalpha=0.7
sets the transparency level.plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
: This command creates a scatter plot of the predicted values, using the same x-coordinates for comparison. Thelabel
parameter is set for the legend, andalpha=0.7
sets the transparency level.plt.title('Actual vs Predicted Values')
: Sets the title of the plot.plt.xlabel('Sample Index')
: Sets the x-axis label to 'Sample Index'.plt.ylabel('Value')
: Sets the y-axis label to 'Value'.plt.legend()
: Displays the legend to differentiate between actual and predicted values.plt.show()
: Renders the plot to the screen.
In this lesson, you learned how to train and evaluate a Gradient Boosting Regressor using Tesla ($TSLA
) stock data. You've reviewed data preparation, added technical indicators, trained the model, evaluated it using MSE, and visualized the results.
By understanding and implementing these steps, you are better prepared to apply machine learning models to financial data for predictive analysis. Practice these steps to solidify your understanding and apply these concepts to enhance your trading strategies using machine learning.