Indexing and Selecting Data in Pandas

Lesson 3

Introduction

Hello! Today we're diving into Indexing and Selecting Data in pandas, a crucial part of data manipulation and analysis. Indexing helps us locate data in specific rows while selecting focuses on picking specific columns or cells.

We'll delve into how to select and index data using pandas by walking you through some hands-on examples. Let's begin!

Understanding Indexing: Setting Index

In pandas, an index is more or less the address of your data. By default, pandas assigns integer labels to the rows, but we can set any column as the index. This effectively turns it into an identifier for the rows.

Here's a basic example using pandas DataFrame's set_index(), reset_index(), and rename() methods:

Python
1import pandas as pd
2
3df = pd.DataFrame({
4  "Name": ["Alice", "Bob", "John"],
5  "Age": [25, 22, 30],
6  "City": ["New York", "Los Angeles", "Chicago"]
7})
8
9df.set_index("Name", inplace=True)
10print(df)
11    # Output:
12    #         Age          City
13    # Name                     
14    # Alice   25      New York
15    # Bob     22   Los Angeles
16    # John    30       Chicago

Accessing data using the index is performed with pandas loc[] method for label-based indexing and iloc[] method for integer-based indexing, which we will investigate later.

The inplace parameter is common for a lot of pandas dataframe methods. If inplace is set to True, changes are applied to the target dataframe. Otherwise, the target dataframe will be copied, the copy will be changed and returned.

However, it is important to note that in the pandas 3.0 the `inplace parameter will be omitted, and you will have to do it this way:


1df = df.set_index("Name")

Understanding Indexing: Resetting Index

If you want to reset index back to the default, it is done easily with the following method:

Python
1df.reset_index(inplace=True)
2print(df)
3    # Output:
4    #     Name  Age          City
5    # 0  Alice   25      New York
6    # 1    Bob   22   Los Angeles
7    # 2   John   30       Chicago

Understanding Indexing: Renaming Index

Renaming the index is simply renaming the corresponding column. It is done with the rename method:

Python
1df.rename(columns={"Name": "Student Name", "Age": "Student Age"}, inplace=True)
2print(df)
3    # Output:
4    #   Student Name  Student Age          City
5    # 0        Alice           25      New York
6    # 1          Bob           22   Los Angeles
7    # 2         John           30       Chicago

Here, we provide a dictionary where the key is the old name, and the value is the new name.

Selecting Data Using Labels and Location

pandas provides loc[] and iloc[] for accessing data in a DataFrame in a manner similar to array indexing for label-based and integer-based indexing, respectively. loc[] uses label-based indexing, and iloc[] uses integer-based indexing.

Let's understand this with an example:

Python
1df = pd.DataFrame({
2  "Name": ["Alice", "Bob", "John", "Robert", "Ann"],
3  "Age": [25, 22, 30, 28, 32],
4  "City": ["New York", "Los Angeles", "Chicago", "San Francisco", "Houston"]
5})
6
7df.set_index("Name", inplace=True)
8
9print(df.loc[["Alice", "John"], ["Age", "City"]])
10    # Output:
11    #         Age       City
12    # Name                 
13    # Alice   25   New York
14    # John    30    Chicago
15
16print(df.iloc[[1, 3], [0, 1]])
17    # Output:
18    #         Age           City
19    # Name                      
20    # Bob      22    Los Angeles
21    # Robert   28  San Francisco

Note that we set the "Name" column as index. In loc, we use labels (which Is the name-indices and column names) to select the required data. In iloc, we use numerical indices for both rows and columns: It works similarly to 2d NumPy arrays.

Lesson Summary and Practice

Congrats on completing this lesson! You've learned how to index and select data in pandas, including functions like set_index(), reset_index(), loc[], and iloc[].

Next up are some practice exercises. These exercises will help solidify what you've learned in this lesson. It's crucial to practice when learning new programming skills.

In the next lesson, we will dive deeper into pandas and cover more useful features. Stay tuned!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.