Hello, and welcome back! In today's lesson, we'll dive into the fundamentals of handling time series data using the Pandas library. Specifically, we'll focus on working with Tesla's ($TSLA
) stock data. The primary goal is to make you proficient in loading, converting, and sorting time series data, which is a critical skill for financial analysis and trading.
By the end of this lesson, you'll be able to load stock data, convert it into a datetime format, set it as an index, and sort it for future analysis.
Let's quickly revise how to load Tesla's historical stock data and convert it into a Pandas DataFrame for easier manipulation:
Python1import pandas as pd 2import datasets 3 4# Load TSLA dataset 5tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices') 6tesla_df = pd.DataFrame(tesla_data['train']) 7 8# Display the first few rows 9print(tesla_df.head())
The output will look like this:
Plain text1 Date Open High Low Close Adj Close Volume 20 2010-06-29 1.266667 1.666667 1.169333 1.592667 1.592667 281494500 31 2010-06-30 1.719333 2.028000 1.553333 1.588667 1.588667 257806500 42 2010-07-01 1.666667 1.728000 1.351333 1.464000 1.464000 123282000 53 2010-07-02 1.533333 1.540000 1.247333 1.280000 1.280000 77097000 64 2010-07-06 1.333333 1.333333 1.055333 1.074000 1.074000 103003500
Now that you've loaded the Tesla dataset and displayed the first few rows let's move on to handling the Date column.
The Date
column is crucial for time series data analysis. It's currently in string format, so we'll need to convert it to a datetime object. By converting it, you can leverage Pandas’ powerful date-time functionalities, such as resampling and shifting.
Here's how to convert the Date
column:
Python1# Convert the Date column to datetime type 2tesla_df['Date'] = pd.to_datetime(tesla_df['Date']) 3 4# Display the first few rows to verify the change 5print(tesla_df.head())
Output:
Plain text1 Date Open High Low Close Adj Close Volume 20 2010-06-29 1.266667 1.666667 1.169333 1.592667 1.592667 281494500 31 2010-06-30 1.719333 2.028000 1.553333 1.588667 1.588667 257806500 42 2010-07-01 1.666667 1.728000 1.351333 1.464000 1.464000 123282000 53 2010-07-02 1.533333 1.540000 1.247333 1.280000 1.280000 77097000 64 2010-07-06 1.333333 1.333333 1.055333 1.074000 1.074000 103003500
Now, the Date
column has been converted to datetime
format, enabling us to perform further time series operations.
Setting the Date
column as the index is crucial for time series operations. It allows us to sort the data chronologically and makes it easier to slice by specific dates.
Here's how to set the date as the index:
Python1# Set Date column as the index 2tesla_df.set_index('Date', inplace=True) 3 4# Display the first few rows to verify the change 5print(tesla_df.head())
Output:
Plain text1 Open High Low Close Adj Close Volume 2Date 32010-06-29 1.266667 1.666667 1.169333 1.592667 1.592667 281494500 42010-06-30 1.719333 2.028000 1.553333 1.588667 1.588667 257806500 52010-07-01 1.666667 1.728000 1.351333 1.464000 1.464000 123282000 62010-07-02 1.533333 1.540000 1.247333 1.280000 1.280000 77097000 72010-07-06 1.333333 1.333333 1.055333 1.074000 1.074000 103003500
Now the Date
column is set as the index, making our DataFrame easier to work with in time series analysis. The inplace=True
argument allows you to modify the DataFrame in-place. This means it directly alters the original DataFrame without creating and returning a new one. Using inplace=True
can be more memory efficient and slightly faster, as it avoids the overhead of copying the DataFrame.
Sorting the data by date ensures chronological order, which is essential for analysis such as plotting, calculating returns, and other time-based computations. To demonstrate sorting clearly, we'll sort the data in descending order.
Here's how to sort the DataFrame based on the index in descending order:
Python1# Sort the DataFrame based on the index in descending order 2tesla_df.sort_index(ascending=False, inplace=True) 3 4# Display the first few rows to verify the change 5print(tesla_df.head())
The output of the above code will be:
Plain text1 Open High ... Adj Close Volume 2Date ... 32023-10-13 258.899994 259.600006 ... 251.119995 102073800 42023-10-12 262.920013 265.410004 ... 258.869995 111508100 52023-10-11 266.200012 268.600006 ... 262.989990 103706300 62023-10-10 257.750000 268.940002 ... 263.619995 122656000 72023-10-09 255.309998 261.359985 ... 259.670013 101377900
This confirms that after setting the Date
as the index and sorting in descending order, the DataFrame is now correctly sorted by the date in descending order, starting from the most recent entry in the dataset. It ensures that any analysis conducted on the dataset accounts for the temporal sequence of events.
Now, the DataFrame is sorted chronologically based on the date index in descending order.
Finally, it's essential to verify that all the changes you made have been applied correctly. We can do this by printing the first few rows of the DataFrame again.
Here’s the complete code to verify all the steps:
Python1import pandas as pd 2import datasets 3 4# Load TSLA dataset 5tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices') 6tesla_df = pd.DataFrame(tesla_data['train']) 7 8# Convert the Date column to datetime type and set as index 9tesla_df['Date'] = pd.to_datetime(tesla_df['Date']) 10tesla_df.set_index('Date', inplace=True) 11 12# Sort the DataFrame based on the index in descending order 13tesla_df.sort_index(ascending=False, inplace=True) 14 15# Display the first few rows to verify the changes 16print(tesla_df.head())
Output:
Plain text1 Open High ... Adj Close Volume 2Date ... 32023-10-13 258.899994 259.600006 ... 251.119995 102073800 42023-10-12 262.920013 265.410004 ... 258.869995 111508100 52023-10-11 266.200012 268.600006 ... 262.989990 103706300 62023-10-10 257.750000 268.940002 ... 263.619995 122656000 72023-10-09 255.309998 261.359985 ... 259.670013 101377900
This confirms that our DataFrame is properly loaded, converted, indexed, and sorted in descending order, and it is now ready for further financial analysis.
Great job! In this lesson, you have mastered the basics of handling time series data in Pandas. You learned how to load Tesla stock data, convert the Date
column to datetime
, set it as the index, sort the DataFrame in descending order, and verify the changes. These skills are crucial for financial analysis and building predictive models.
Practice exercises will follow to reinforce these concepts. By mastering time series data manipulation, you will be better equipped to perform effective financial analysis and make informed trading decisions.