Hello! In this lesson, we will explore the fundamental techniques for inspecting financial data using the Pandas
library in Python. Our goal is to enable you to load financial data, inspect its structure, and perform basic data analysis. Let's get started!
First, let's recap how to import the necessary libraries and load the dataset. In this scenario, we'll use Tesla (TSLA) historical stock prices.
- Import Libraries: We need to import
pandas
for data manipulation and thedatasets
library to load our data. - Load the Dataset: We use the
load_dataset
function from thedatasets
library to load the Tesla dataset. - Convert to DataFrame: We convert the loaded dataset into a
Pandas
DataFrame. - Display Data: Using the
head()
andtail()
methods, we can view the first few and last few rows of the dataset, respectively.
Here's the code to achieve this:
Python1import pandas as pd 2import datasets 3 4# Load TSLA dataset 5tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices') 6tesla_df = pd.DataFrame(tesla_data['train']) 7 8# Display first 5 rows of the DataFrame 9print(tesla_df.head())
This code snippet loads the TSLA dataset and displays the first 5 rows to help us get a quick look at the data.
Next, we want to understand the structure of our dataset. This involves examining the columns, data types, and the number of non-null entries. The info()
method of a Pandas
DataFrame provides a concise summary of these details.
- Data Structure Information: The
info()
method reveals important aspects such as:- Column names and data types
- Non-null counts for each column
Here's the code to inspect the data structure:
Python1# Print basic information about the dataset 2print(tesla_df.info())
The output will be:
Plain text1<class 'pandas.core.frame.DataFrame'> 2RangeIndex: 3347 entries, 0 to 3346 3Data columns (total 7 columns): 4 # Column Non-Null Count Dtype 5--- ------ -------------- ----- 6 0 Date 3347 non-null object 7 1 Open 3347 non-null float64 8 2 High 3347 non-null float64 9 3 Low 3347 non-null float64 10 4 Close 3347 non-null float64 11 5 Adj Close 3347 non-null float64 12 6 Volume 3347 non-null int64 13dtypes: float64(5), int64(1), object(1) 14memory usage: 183.2+ KB 15None
This output summarizes the dataset structure, showing that it consists of 3347 entries with 7 different columns. It also highlights that there are no missing values in the dataset, and it provides the data type of each column, which is essential to understand before performing any data manipulation or analysis.
To gain preliminary insights into our data, we can use the describe()
method, which provides summary statistics such as mean, standard deviation, minimum, and maximum values, and quartiles.
- Descriptive Statistics: The
describe()
method presents these key statistics for all numerical columns in the DataFrame, helping us understand data distribution and identify any anomalies.
Here's the code to generate summary statistics:
Python1# Display summary statistics 2print(tesla_df.describe())
The output will be:
Plain text1 Open High ... Adj Close Volume 2count 3347.000000 3347.000000 ... 3347.000000 3.347000e+03 3mean 67.901248 69.413435 ... 67.886520 9.643192e+07 4std 100.209872 102.472746 ... 100.136888 8.058132e+07 5min 1.076000 1.108667 ... 1.053333 1.777500e+06 625% 10.152000 10.432000 ... 10.081333 4.540425e+07 750% 16.793333 17.000000 ... 16.771334 8.011650e+07 875% 66.069336 67.129334 ... 65.896000 1.230548e+08 9max 411.470001 414.496674 ... 409.970001 9.140820e+08 10 11[8 rows x 6 columns]
This concise summary details the distribution of Tesla's stock prices, including the mean, standard deviation, minimum, and maximum values across various metrics such as opening price, high, low, close, adjusted close, and volume. It provides a snapshot of the stock's volatility and trading volume, which are critical for financial analysis.
In this lesson, you have learned the basics of data inspection using Pandas
. We have covered how to:
- Load a dataset and convert it into a DataFrame.
- Display the data using the
head()
method. - Inspect the data structure using the
info()
method. - Generate summary statistics using the
describe()
method.
These fundamental skills are crucial for analyzing financial data and making informed trading decisions. Practice exercises will follow to reinforce your understanding and improve your data handling proficiency. Let's keep up the momentum and continue mastering financial data handling in Pandas
!