Hello and welcome to our journey into data analysis with Python and pandas. Today we'll discover pandas DataFrames and learn about Loading and Viewing Data.
Pandas, a fantastic Python library, simplifies data manipulation and analysis. Our focus today is DataFrames
— the go-to structure in pandas for data handling.
We will read data from different sources using pandas
, load it into a DataFrame
, and then explore this data. Let's begin!
Installing and importing the pandas
library is like getting our recipe book ready before we start cooking. In our CodeSignal kitchen, pandas comes pre-installed. To open the book, we just need to import pandas
into our script. It's as simple as:
Python1import pandas as pd # Pandas successfully imported
This line sets a short alias, pd
, for pandas
so we don't have to write out pandas
each time we use it.
In pandas
, a DataFrame
is like a table, with the data as the dishes on the table. Creating a DataFrame
out of a list or a dictionary is a snap with pandas
. Here's how:
Python1import pandas as pd 2 3# From a list 4data_list = ['apple', 'banana', 'cherry'] 5df_list = pd.DataFrame(data_list, columns=['Fruit']) 6print(df_list) 7# Output: 8# Fruit 9# 0 apple 10# 1 banana 11# 2 cherry
And here is how to create a dataframe from dictionary:
Python1# From a dictionary 2data_dict = {'Fruit': ['apple', 'banana', 'cherry'], 'Count': [10, 20, 15]} 3df_dict = pd.DataFrame(data_dict) 4print(df_dict) 5# Output: 6# Fruit Count 7# 0 apple 10 8# 1 banana 20 9# 2 cherry 15
Now that we have our data in a DataFrame
, how do we look at it and understand it? Pandas
provides us with methods like head()
, tail()
, and info()
. Here's how to use them:
Python1# First 5 rows 2print(df.head()) # Output: First 5 rows of DataFrame 'df' 3 4# Last 5 rows 5print(df.tail()) # Output: Last 5 rows of DataFrame 'df'
In our case, we have just three rows in the dataframe, so both head()
and tail()
will simply output the whole dataframe. However, for real data with lots of rows, they are quite useful!
Let's take a look at the dataframe's Overview:
Python1# Overview 2print(df.info()) 3# <class 'pandas.core.frame.DataFrame'> 4# RangeIndex: 3 entries, 0 to 2 5# Data columns (total 2 columns): 6# # Column Non-Null Count Dtype 7# --- ------ -------------- ----- 8# 0 Fruit 3 non-null object 9# 1 Count 3 non-null int64 10# dtypes: int64(1), object(1)
As you see, the overview contains information about column's names, amount of present data and data types for each column.
Sometimes, you might need to combine multiple DataFrames into a single one. This can be done using the pd.concat
function for any dataframes with the same set of columns. Here's a simple example:
Python1import pandas as pd 2 3# Creating first DataFrame 4data1 = {'Fruit': ['apple', 'banana'], 'Count': [10, 20]} 5df1 = pd.DataFrame(data1) 6 7# Creating second DataFrame 8data2 = {'Fruit': ['cherry', 'date'], 'Count': [15, 25]} 9df2 = pd.DataFrame(data2) 10 11# Concatenating DataFrames 12df_combined = pd.concat([df1, df2]) 13print(df_combined) 14# Output: 15# Fruit Count 16# 0 apple 10 17# 1 banana 20 18# 0 cherry 15 19# 1 date 25
In this example, pd.concat
takes a list of DataFrames as its argument and combines them along their rows by default. Notice that the indices are preserved. If you want to ignore the original indices and create a new continuous index, you can pass the argument ignore_index=True
to pd.concat
:
Python1df_combined = pd.concat([df1, df2], ignore_index=True) 2print(df_combined) 3# Output: 4# Fruit Count 5# 0 apple 10 6# 1 banana 20 7# 2 cherry 15 8# 3 date 25
This way, the resulting DataFrame will have a new set of sequential indices.
In a DataFrame
, each column is a Series
object. A Series
in pandas is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, Python objects, etc.). It is essentially a list of values with an associated label (index) for each value. Here is an example:
Python1import pandas as pd 2 3# Create a Series from a list 4fruits = ['apple', 'banana', 'cherry'] 5series = pd.Series(fruits) 6 7print(series) 8# Output: 9# 0 apple 10# 1 banana 11# 2 cherry 12# dtype: object
Whenever you work with a single dataframe's column, you work with a series. Series objects have their own set of methods, but most of them overlap with the dataframe's method. For example, series also have methods like head
, tail
or describe
.
Well done! You've learned how to load data into a pandas DataFrame
and view the data. It's a solid start to data analysis.
In this lesson, we've covered what pandas
and a DataFrame
are, how to load data into a DataFrame
, and methods to view the data.
Remember, practice makes perfect, so look forward to reinforcing your newfound skills in the upcoming practice exercises. Stick with it, and you'll build a strong foundation to excel in data analysis. Happy coding!