Welcome, future data analyzers! Today, we're tackling Index Columns and Locating Elements in a Pandas DataFrame. We'll learn how to handle index columns, locate specific data, and strengthen our understanding of DataFrames. Ready, set, code!
In a Pandas DataFrame, an index is assigned to each row, much like the numbers on books in a library. When a DataFrame is created, Pandas establishes a default index. Let's refer to an example:
Python1import pandas as pd 2 3data = { 4 "Name": ["John", "Anna", "Peter", "Linda"], 5 "Age": [28, 24, 35, 32], 6 "City": ["New York", "Paris", "Berlin", "London"] 7} 8 9df = pd.DataFrame(data) 10 11print(df) 12"""Output: 13 Name Age City 140 John 28 New York 151 Anna 24 Paris 162 Peter 35 Berlin 173 Linda 32 London 18"""
The numbers on the left are the default index.
Occasionally, we might need to establish a custom index. The Pandas' set_index()
function allows us to set a custom index. To reset the index to its default state, we use reset_index()
.
To better understand these functions, let's consider an example in which we create an index using unique IDs:
Python1df['ID'] = [101, 102, 103, 104] # Adding unique IDs 2df.set_index('ID', inplace=True) # Setting 'ID' as index 3 4print(df) 5"""Output: 6 Name Age City 7ID 8101 John 28 New York 9102 Anna 24 Paris 10103 Peter 35 Berlin 11104 Linda 32 London 12"""
In this example, ID
column is displayed as an index. Let's reset the index to return to the original state:
Python1df.reset_index(inplace=True) # Resetting index 2 3print(df) 4"""Output: 5 ID Name Age City 60 101 John 28 New York 71 102 Anna 24 Paris 82 103 Peter 35 Berlin 93 104 Linda 32 London 10"""
By setting inplace
parameter to True
, we ask pandas to reset the index in the original df
dataframe. Otherwise, pandas will create a copy of the data frame with a reset index, leaving the original df
untouched.
Let's consider a dataframe with a custom index. If you want to select a specific row based on its index value (for example, ID = 102
), you can do this:
Python1import pandas as pd 2 3data = { 4 "Name": ["John", "Anna", "Peter", "Linda"], 5 "Age": [28, 24, 35, 32], 6 "City": ["New York", "Paris", "Berlin", "London"] 7} 8 9df = pd.DataFrame(data) 10df['ID'] = [101, 102, 103, 104] # Adding unique IDs 11df.set_index('ID', inplace=True) # Setting 'ID' as index 12 13print(df.loc[102]) 14'''Output: 15Name Anna 16Age 24 17City Paris 18Name: 102, dtype: object 19'''
For multiple rows, simply use list of ids:
Python1print(df.loc[[102, 104]]) 2 3'''Output: 4 Name Age City 5ID 6102 Anna 24 Paris 7104 Linda 32 London 8'''
As you can see, the output of the .loc
operation is some subset of the original dataframe.
To select specific multiple columns for these rows, you can provide the column labels as well:
Python1print(df.loc[[102, 104], ['Name', 'Age']]) 2'''Output: 3 Name Age 4ID 5102 Anna 24 6104 Linda 32 7'''
Also you can select all rows for specific columns, providing :
as a set of index labels:
Python1print(df.loc[:, ['Name', 'Age']]) 2'''Output: 3 Name Age 4ID 5101 John 28 6102 Anna 24 7103 Peter 35 8104 Linda 32 9'''
The iloc
function enables us to select elements in a data frame based on their index positions. iloc
works like the loc
, but it expects the index number of the rows. For example, we can select the 3
rd row:
Python1print(df.iloc[3]) 2'''Output: 3Name Linda 4Age 32 5City London 6Name: 104, dtype: object 7'''
You can also use slicing here:
Python1print(df.iloc[1:3]) 2'''Output: 3 Name Age City 4ID 5102 Anna 24 Paris 6103 Peter 35 Berlin 7'''
That's it! We've covered the index column, how to set it, and how to locate data in a DataFrame. Exciting exercises are up next. Let's practice and strengthen the skills you've learned today. Let the fun begin!