Lesson 3
Time Series Data Handling in Pandas for Tesla Stock Analysis
Introduction to Time Series Data Handling

Hello, and welcome back! In today's lesson, we'll dive into the fundamentals of handling time series data using the Pandas library. Specifically, we'll focus on working with Tesla's ($TSLA) stock data. The primary goal is to make you proficient in loading, converting, and sorting time series data, which is a critical skill for financial analysis and trading.

By the end of this lesson, you'll be able to load stock data, convert it into a datetime format, set it as an index, and sort it for future analysis.

Loading the Tesla Dataset

Let's quickly revise how to load Tesla's historical stock data and convert it into a Pandas DataFrame for easier manipulation:

Python
1import pandas as pd 2import datasets 3 4# Load TSLA dataset 5tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices') 6tesla_df = pd.DataFrame(tesla_data['train']) 7 8# Display the first few rows 9print(tesla_df.head())

The output will look like this:

Plain text
1 Date Open High Low Close Adj Close Volume 20 2010-06-29 1.266667 1.666667 1.169333 1.592667 1.592667 281494500 31 2010-06-30 1.719333 2.028000 1.553333 1.588667 1.588667 257806500 42 2010-07-01 1.666667 1.728000 1.351333 1.464000 1.464000 123282000 53 2010-07-02 1.533333 1.540000 1.247333 1.280000 1.280000 77097000 64 2010-07-06 1.333333 1.333333 1.055333 1.074000 1.074000 103003500

Now that you've loaded the Tesla dataset and displayed the first few rows let's move on to handling the Date column.

Understanding Date and Time in Pandas

The Date column is crucial for time series data analysis. It's currently in string format, so we'll need to convert it to a datetime object. By converting it, you can leverage Pandas’ powerful date-time functionalities, such as resampling and shifting.

Here's how to convert the Date column:

Python
1# Convert the Date column to datetime type 2tesla_df['Date'] = pd.to_datetime(tesla_df['Date']) 3 4# Display the first few rows to verify the change 5print(tesla_df.head())

Output:

Plain text
1 Date Open High Low Close Adj Close Volume 20 2010-06-29 1.266667 1.666667 1.169333 1.592667 1.592667 281494500 31 2010-06-30 1.719333 2.028000 1.553333 1.588667 1.588667 257806500 42 2010-07-01 1.666667 1.728000 1.351333 1.464000 1.464000 123282000 53 2010-07-02 1.533333 1.540000 1.247333 1.280000 1.280000 77097000 64 2010-07-06 1.333333 1.333333 1.055333 1.074000 1.074000 103003500

Now, the Date column has been converted to datetime format, enabling us to perform further time series operations.

Setting the Date as the Index

Setting the Date column as the index is crucial for time series operations. It allows us to sort the data chronologically and makes it easier to slice by specific dates.

Here's how to set the date as the index:

Python
1# Set Date column as the index 2tesla_df.set_index('Date', inplace=True) 3 4# Display the first few rows to verify the change 5print(tesla_df.head())

Output:

Plain text
1 Open High Low Close Adj Close Volume 2Date 32010-06-29 1.266667 1.666667 1.169333 1.592667 1.592667 281494500 42010-06-30 1.719333 2.028000 1.553333 1.588667 1.588667 257806500 52010-07-01 1.666667 1.728000 1.351333 1.464000 1.464000 123282000 62010-07-02 1.533333 1.540000 1.247333 1.280000 1.280000 77097000 72010-07-06 1.333333 1.333333 1.055333 1.074000 1.074000 103003500

Now the Date column is set as the index, making our DataFrame easier to work with in time series analysis. The inplace=True argument allows you to modify the DataFrame in-place. This means it directly alters the original DataFrame without creating and returning a new one. Using inplace=True can be more memory efficient and slightly faster, as it avoids the overhead of copying the DataFrame.

Sorting the DataFrame by Date

Sorting the data by date ensures chronological order, which is essential for analysis such as plotting, calculating returns, and other time-based computations. To demonstrate sorting clearly, we'll sort the data in descending order.

Here's how to sort the DataFrame based on the index in descending order:

Python
1# Sort the DataFrame based on the index in descending order 2tesla_df.sort_index(ascending=False, inplace=True) 3 4# Display the first few rows to verify the change 5print(tesla_df.head())

The output of the above code will be:

Plain text
1 Open High ... Adj Close Volume 2Date ... 32023-10-13 258.899994 259.600006 ... 251.119995 102073800 42023-10-12 262.920013 265.410004 ... 258.869995 111508100 52023-10-11 266.200012 268.600006 ... 262.989990 103706300 62023-10-10 257.750000 268.940002 ... 263.619995 122656000 72023-10-09 255.309998 261.359985 ... 259.670013 101377900

This confirms that after setting the Date as the index and sorting in descending order, the DataFrame is now correctly sorted by the date in descending order, starting from the most recent entry in the dataset. It ensures that any analysis conducted on the dataset accounts for the temporal sequence of events.

Now, the DataFrame is sorted chronologically based on the date index in descending order.

Verifying the Changes

Finally, it's essential to verify that all the changes you made have been applied correctly. We can do this by printing the first few rows of the DataFrame again.

Here’s the complete code to verify all the steps:

Python
1import pandas as pd 2import datasets 3 4# Load TSLA dataset 5tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices') 6tesla_df = pd.DataFrame(tesla_data['train']) 7 8# Convert the Date column to datetime type and set as index 9tesla_df['Date'] = pd.to_datetime(tesla_df['Date']) 10tesla_df.set_index('Date', inplace=True) 11 12# Sort the DataFrame based on the index in descending order 13tesla_df.sort_index(ascending=False, inplace=True) 14 15# Display the first few rows to verify the changes 16print(tesla_df.head())

Output:

Plain text
1 Open High ... Adj Close Volume 2Date ... 32023-10-13 258.899994 259.600006 ... 251.119995 102073800 42023-10-12 262.920013 265.410004 ... 258.869995 111508100 52023-10-11 266.200012 268.600006 ... 262.989990 103706300 62023-10-10 257.750000 268.940002 ... 263.619995 122656000 72023-10-09 255.309998 261.359985 ... 259.670013 101377900

This confirms that our DataFrame is properly loaded, converted, indexed, and sorted in descending order, and it is now ready for further financial analysis.

Lesson Summary

Great job! In this lesson, you have mastered the basics of handling time series data in Pandas. You learned how to load Tesla stock data, convert the Date column to datetime, set it as the index, sort the DataFrame in descending order, and verify the changes. These skills are crucial for financial analysis and building predictive models.

Practice exercises will follow to reinforce these concepts. By mastering time series data manipulation, you will be better equipped to perform effective financial analysis and make informed trading decisions.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.