Navigating DataFrames with Index Column and Data Locating in Pandas

Python Libraries for Data AnalysisLesson 6

Lesson 6

Introduction and Lesson Overviews

Welcome, future data analyzers! Today, we're tackling Index Columns and Locating Elements in a Pandas DataFrame. We'll learn how to handle index columns, locate specific data, and strengthen our understanding of DataFrames. Ready, set, code!

Understanding the Index Column in a Pandas DataFrame

In a Pandas DataFrame, an index is assigned to each row, much like the numbers on books in a library. When a DataFrame is created, Pandas establishes a default index. Let's refer to an example:

Python
1import pandas as pd
2
3data = {
4    "Name": ["John", "Anna", "Peter", "Linda"],
5    "Age": [28, 24, 35, 32],
6    "City": ["New York", "Paris", "Berlin", "London"]
7}
8
9df = pd.DataFrame(data)
10
11print(df)
12"""Output:
13    Name  Age      City
140   John   28  New York
151   Anna   24     Paris
162  Peter   35    Berlin
173  Linda   32    London
18"""

The numbers on the left are the default index.

Setting and Modifying the Index Column

Occasionally, we might need to establish a custom index. The Pandas' set_index() function allows us to set a custom index. To reset the index to its default state, we use reset_index().

To better understand these functions, let's consider an example in which we create an index using unique IDs:

Python
1df['ID'] = [101, 102, 103, 104]    # Adding unique IDs
2df.set_index('ID', inplace=True)   # Setting 'ID' as index
3
4print(df)
5"""Output:
6      Name  Age      City
7ID                       
8101   John   28  New York
9102   Anna   24     Paris
10103  Peter   35    Berlin
11104  Linda   32    London
12"""

In this example, ID column is displayed as an index. Let's reset the index to return to the original state:

Python
1df.reset_index(inplace=True)       # Resetting index
2
3print(df)
4"""Output:
5    ID   Name  Age      City
60  101   John   28  New York
71  102   Anna   24     Paris
82  103  Peter   35    Berlin
93  104  Linda   32    London
10"""

By setting inplace parameter to True, we ask pandas to reset the index in the original df dataframe. Otherwise, pandas will create a copy of the data frame with a reset index, leaving the original df untouched.

Locating Elements in a DataFrame

Let's consider a dataframe with a custom index. If you want to select a specific row based on its index value (for example, ID = 102), you can do this:

Python
1import pandas as pd
2
3data = {
4    "Name": ["John", "Anna", "Peter", "Linda"],
5    "Age": [28, 24, 35, 32],
6    "City": ["New York", "Paris", "Berlin", "London"]
7}
8
9df = pd.DataFrame(data)
10df['ID'] = [101, 102, 103, 104]    # Adding unique IDs
11df.set_index('ID', inplace=True)   # Setting 'ID' as index
12
13print(df.loc[102])
14'''Output:
15Name     Anna
16Age        24
17City    Paris
18Name: 102, dtype: object
19'''

Selecting Multiple Rows with `loc`

For multiple rows, simply use list of ids:

Python
1print(df.loc[[102, 104]])
2
3'''Output:
4      Name  Age    City
5ID                     
6102   Anna   24   Paris
7104  Linda   32  London
8'''

As you can see, the output of the .loc operation is some subset of the original dataframe.

Selecting Multiple Columns with `loc`

To select specific multiple columns for these rows, you can provide the column labels as well:

Python
1print(df.loc[[102, 104], ['Name', 'Age']])
2'''Output:
3      Name  Age
4ID             
5102   Anna   24
6104  Linda   32
7'''

Also you can select all rows for specific columns, providing : as a set of index labels:

Python
1print(df.loc[:, ['Name', 'Age']])
2'''Output:
3      Name  Age
4ID             
5101   John   28
6102   Anna   24
7103  Peter   35
8104  Linda   32
9'''

Using `iloc` for Location by Index Position

The iloc function enables us to select elements in a data frame based on their index positions. iloc works like the loc, but it expects the index number of the rows. For example, we can select the 3rd row:

Python
1print(df.iloc[3])
2'''Output:
3Name     Linda
4Age         32
5City    London
6Name: 104, dtype: object
7'''

You can also use slicing here:

Python
1print(df.iloc[1:3])
2'''Output:
3      Name  Age    City
4ID                     
5102   Anna   24   Paris
6103  Peter   35  Berlin
7'''

Lesson Summary and Next Steps

That's it! We've covered the index column, how to set it, and how to locate data in a DataFrame. Exciting exercises are up next. Let's practice and strengthen the skills you've learned today. Let the fun begin!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.