Mastering Row Manipulation in Pandas DataFrame

Lesson 5

Introduction to Adding and Removing Rows in a Pandas DataFrame

During today's session, we will delve into how to add and remove rows from a DataFrame in Pandas. These are vital tools for data manipulation, whether adding new entries or eliminating unnecessary data.

Consider it analogous to adding a name to your contacts or deleting an item from your shopping list. We will be carrying out similar operations but with a DataFrame. Let's begin:

Python
1import pandas as pd

Quick Recap on Rows in a DataFrame

A DataFrame, a central data structure in Pandas, is a tool for storing data in table form. Each row contains values correlated to an individual entry in our data. For instance, each row of a grocery list might represent a unique grocery item.

Each row features an index, a unique identifier. Now, let's create a DataFrame:

Python
1import pandas as pd
2
3data = {
4    'Grocery Item': ['Apples', 'Oranges', 'Bananas', 'Grapes'],
5    'Price per kg': [3.25, 4.50, 2.75, 5.00]
6}
7
8grocery_df = pd.DataFrame(data)
9
10print(grocery_df)
11'''Output:
12  Grocery Item  Price per kg
130       Apples          3.25
141      Oranges          4.50
152      Bananas          2.75
163       Grapes          5.00
17'''

Adding a Row to a DataFrame

Multiple scenarios might necessitate adding new entries to our DataFrame. Let's explore how to accomplish that:

In modern pandas, we use pd.concat() function to incorporate new rows. If you forgot to add 'Pears' to your grocery list, here’s how to do it:

Python
1new_row = pd.DataFrame({'Grocery Item': ['Pears'], 'Price per kg': [4.00]})
2
3grocery_df = pd.concat([grocery_df, new_row]).reset_index(drop=True)
4
5print(grocery_df)
6'''Output:
7  Grocery Item  Price per kg
80       Apples          3.25
91      Oranges          4.50
102      Bananas          2.75
113       Grapes          5.00
124        Pears          4.00
13'''

Setting reset_index(drop=True) resets the index to default integers. Without this step, pandas will save the original dataframes' indices, resulting in both 'Pears' and 'Apples' sharing the same index 0.

Adding Multiple Rows to a DataFrame

For multiple rows, you can concatenate them by creating a DataFrame and adding it to the original one:

Python
1new_rows = pd.DataFrame({
2    'Grocery Item': ['Avocados', 'Blueberries'],
3    'Price per kg': [2.5, 10.0]
4})
5
6grocery_df = pd.concat([grocery_df, new_rows]).reset_index(drop=True)
7
8print(grocery_df)
9'''Output:
10  Grocery Item  Price per kg
110       Apples          3.25
121      Oranges          4.50
132      Bananas          2.75
143       Grapes          5.00
154     Avocados          2.50
165  Blueberries         10.00
17'''

You may wonder why we don't include these rows in the original dataframe. Well, it is only sometimes possible. Imagine we have two separate grocery lists coming from different sources, for instance, from separate files. In this case, the only way to combine them into one is to use pd.concat()

Removing Rows from a DataFrame

Frequently, we must delete rows from a DataFrame. To facilitate this, Pandas provides the drop() function. Suppose you want to remove 'Grapes' or both 'Apples' and 'Oranges' from your list. Here's how:

Python
1index_to_delete = grocery_df[grocery_df['Grocery Item'] == 'Grapes'].index
2
3grocery_df = grocery_df.drop(index_to_delete)
4
5print(grocery_df)
6'''Output:
7  Grocery Item  Price per kg
80       Apples          3.25
91      Oranges          4.50
102      Bananas          2.75
11'''

Note that the .drop() method returns a new updated DataFrame instead of changing the original one. It allows you to modify the data while keeping its original state to return to it if necessary.

Removing Multiple Rows

There will be times when you will have to remove multiple rows in one go. For example, let's say you were informed that 'Apples' and 'Oranges' are out of stock, so you need to remove them from your grocery list. The drop() function allows you to do this too.

When removing multiple rows, we utilize the .isin() function, which checks if a value exists in a particular DataFrame column. You provide it with the values you want to remove, and it outputs the indices of those rows. Let's see it in action:

Python
1indices_to_delete = grocery_df[grocery_df['Grocery Item'].isin(['Apples', 'Oranges'])].index
2
3grocery_df = grocery_df.drop(indices_to_delete)
4
5print(grocery_df)
6'''Output:
7  Grocery Item  Price per kg
82      Bananas          2.75
93       Grapes          5.00
10'''

In this block of code, the variable indices_to_delete holds the indices of the rows where the 'Grocery Item' is either 'Apples' or 'Oranges'. We then pass indices_to_delete to the drop() function, which removes the corresponding rows from the DataFrame.

Keep in mind, just as with removing a single row, the drop() function here doesn't change the original DataFrame. Instead, it returns a new DataFrame with the specified rows removed. This way, you can always revert back to the original data if needed.

Recap and Practice Announcement

Congratulations! You've now mastered adding and removing rows in a DataFrame, a crucial element in data manipulation. We discussed rows and their indexing and learned to add rows using pd.concat() and to remove them with drop(). Now, let's put this into practice! The upcoming exercises will enhance your data manipulation skills, enabling you to handle more complex operations on a DataFrame. Are you ready to give them a try?

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.