Feature Importance in Gradient Boosting Models

Lesson 4

Topic Overview

Hello and welcome! Today's lesson focuses on Feature Importance in Gradient Boosting Models. We will explore how to determine which features in our dataset are most influential in predicting Tesla ($TSLA) stock prices. By understanding the importance of features, we can refine our models and make more informed trading decisions.

Revision of Previous Steps

Before diving into feature importance, let's quickly revise the previous steps to ensure we have a solid foundation.

Data Preparation and Feature Engineering:

Python
1import pandas as pd
2import datasets
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler
5
6# Load TSLA dataset
7tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
8tesla_df = pd.DataFrame(tesla['train'])
9
10# Convert Date column to datetime type
11tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])
12
13# Feature Engineering: adding technical indicators as features
14tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
15tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
16tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
17tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()
18
19# Drop NaN values created by moving averages
20tesla_df.dropna(inplace=True)
21
22# Select features and target
23features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
24target = tesla_df['Adj Close'].values
25
26# Splitting the dataset into training and testing sets
27X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)
28
29# Standardizing features
30scaler = StandardScaler()
31X_train = scaler.fit_transform(X_train)
32X_test = scaler.transform(X_test)

Model Training:

Python
1from sklearn.ensemble import GradientBoostingRegressor
2
3# Instantiate and fit the model
4model = GradientBoostingRegressor(random_state=42)
5model.fit(X_train, y_train)

Understanding Feature Importance

What is Feature Importance?

Feature importance refers to techniques that assign scores to input features based on their importance in predicting the target variable. In the context of a Gradient Boosting model, feature importance indicates how valuable each feature is in constructing the boosted decision trees.

Why is Feature Importance Useful?

Understanding feature importance helps:

Identify and select the most influential features, potentially simplifying the model.
Gain insights into the factors driving your predictions.
Improve model interpretability and trustworthiness.

Computing Feature Importance in Gradient Boosting

Once the Gradient Boosting model is trained, we can easily access the feature importances. Let's walk through the steps:

Python
1# Compute feature importance
2feature_importance = model.feature_importances_
3
4# Create a DataFrame for better visualization of feature names alongside their importance
5feature_names = ['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']
6feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})
7
8# Sort features by importance
9feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)
10
11# Print feature importances with names
12print("Feature importance:\n", feature_importance_df)
13# Output:
14# Feature importance:
15#   Feature    Importance
16# 3   Close  9.447889e-01
17# 1    High  3.668675e-02
18# 0    Open  9.142875e-03
19# 2     Low  8.464037e-03
20# 6  SMA_10  4.800413e-04
21# 7   EMA_5  2.992652e-04
22# 8  EMA_10  1.326235e-04
23# 5   SMA_5  5.195267e-06
24# 4  Volume  3.363300e-07

Here's what each step is doing:

model.feature_importances_: Extracts the feature importance scores from the trained Gradient Boosting model.
feature_names = [...]: Defines a list of feature names for better readability.
feature_importance_df = pd.DataFrame(...): Creates a DataFrame that links feature names with their respective importance scores.
feature_importance_df.sort_values(...): Sorts the DataFrame by feature importance in descending order for better interpretation.

Visualizing Feature Importance

Visualizing the importance of features helps interpret the results more effectively. We'll use Matplotlib to create a bar chart:

Python
1import matplotlib.pyplot as plt
2
3feature_importance_df = feature_importance_df.iloc[::-1]
4
5# Plotting feature importance
6plt.figure(figsize=(10,6))
7plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'])
8plt.title('Feature Importances')
9plt.xlabel('Importance')
10plt.ylabel('Feature')
11plt.show()

The plot of the above code is a bar chart visually indicating the significance of each feature, making it easier to distinguish the most influential features. This visualization is crucial for understanding how different features contribute to the model's predictions.

Interpreting the Results

By examining the feature importance values and plot, you can determine which features have the most impact on the model's predictions. For instance, if Adj Close heavily relies on SMA_10 and Close, we know they are critical factors in the stock's movement.

Insights and Next Steps:

Focus on Key Features: Emphasize the most important features in further analysis and model tuning.
Feature Selection: Consider removing less important features to simplify the model.
Model Interpretation: Use feature importance insights to explain model predictions to stakeholders.

Lesson Summary

In this lesson, you learned about the concept of feature importance in Gradient Boosting models and its practical application to predict Tesla ($TSLA) stock prices. You computed feature importances, visualized them using a bar chart, and interpreted the results to gain actionable insights.

Understanding which features influence your model's predictions is crucial for refining your models and making informed trading decisions. Up next, practice these concepts to solidify your understanding and enhance your skillset in machine learning for financial trading.

Great job!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.