Lesson 3

Indexing and Selecting Data in Pandas


Hello! Today we're diving into Indexing and Selecting Data in pandas, a crucial part of data manipulation and analysis. Indexing helps us locate data in specific rows while selecting focuses on picking specific columns or cells.

We'll delve into how to select and index data using pandas by walking you through some hands-on examples. Let's begin!

Understanding Indexing: Setting Index

In pandas, an index is more or less the address of your data. By default, pandas assigns integer labels to the rows, but we can set any column as the index. This effectively turns it into an identifier for the rows.

Here's a basic example using pandas DataFrame's set_index(), reset_index(), and rename() methods:

1import pandas as pd 2 3df = pd.DataFrame({ 4 "Name": ["Alice", "Bob", "John"], 5 "Age": [25, 22, 30], 6 "City": ["New York", "Los Angeles", "Chicago"] 7}) 8 9df.set_index("Name", inplace=True) 10print(df) 11 # Output: 12 # Age City 13 # Name 14 # Alice 25 New York 15 # Bob 22 Los Angeles 16 # John 30 Chicago

Accessing data using the index is performed with pandas loc[] method for label-based indexing and iloc[] method for integer-based indexing, which we will investigate later.

The inplace parameter is common for a lot of pandas dataframe methods. If inplace is set to True, changes are applied to the target dataframe. Otherwise, the target dataframe will be copied, the copy will be changed and returned.

However, it is important to note that in the pandas 3.0 the `inplace parameter will be omitted, and you will have to do it this way:

1df = df.set_index("Name")
Understanding Indexing: Resetting Index

If you want to reset index back to the default, it is done easily with the following method:

1df.reset_index(inplace=True) 2print(df) 3 # Output: 4 # Name Age City 5 # 0 Alice 25 New York 6 # 1 Bob 22 Los Angeles 7 # 2 John 30 Chicago
Understanding Indexing: Renaming Index

Renaming the index is simply renaming the corresponding column. It is done with the rename method:

1df.rename(columns={"Name": "Student Name", "Age": "Student Age"}, inplace=True) 2print(df) 3 # Output: 4 # Student Name Student Age City 5 # 0 Alice 25 New York 6 # 1 Bob 22 Los Angeles 7 # 2 John 30 Chicago

Here, we provide a dictionary where the key is the old name, and the value is the new name.

Selecting Data Using Labels and Location

pandas provides loc[] and iloc[] for accessing data in a DataFrame in a manner similar to array indexing for label-based and integer-based indexing, respectively. loc[] uses label-based indexing, and iloc[] uses integer-based indexing.

Let's understand this with an example:

1df = pd.DataFrame({ 2 "Name": ["Alice", "Bob", "John", "Robert", "Ann"], 3 "Age": [25, 22, 30, 28, 32], 4 "City": ["New York", "Los Angeles", "Chicago", "San Francisco", "Houston"] 5}) 6 7df.set_index("Name", inplace=True) 8 9print(df.loc[["Alice", "John"], ["Age", "City"]]) 10 # Output: 11 # Age City 12 # Name 13 # Alice 25 New York 14 # John 30 Chicago 15 16print(df.iloc[[1, 3], [0, 1]]) 17 # Output: 18 # Age City 19 # Name 20 # Bob 22 Los Angeles 21 # Robert 28 San Francisco

Note that we set the "Name" column as index. In loc, we use labels (which Is the name-indices and column names) to select the required data. In iloc, we use numerical indices for both rows and columns: It works similarly to 2d NumPy arrays.

Lesson Summary and Practice

Congrats on completing this lesson! You've learned how to index and select data in pandas, including functions like set_index(), reset_index(), loc[], and iloc[].

Next up are some practice exercises. These exercises will help solidify what you've learned in this lesson. It's crucial to practice when learning new programming skills.

In the next lesson, we will dive deeper into pandas and cover more useful features. Stay tuned!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.