Lesson 5

Mastering Row Manipulation in Pandas DataFrame

Introduction to Adding and Removing Rows in a Pandas DataFrame

During today's session, we will delve into how to add and remove rows from a DataFrame in Pandas. These are vital tools for data manipulation, whether adding new entries or eliminating unnecessary data.

Consider it analogous to adding a name to your contacts or deleting an item from your shopping list. We will be carrying out similar operations but with a DataFrame. Let's begin:

Python
1import pandas as pd
Quick Recap on Rows in a DataFrame

A DataFrame, a central data structure in Pandas, is a tool for storing data in table form. Each row contains values correlated to an individual entry in our data. For instance, each row of a grocery list might represent a unique grocery item.

Each row features an index, a unique identifier. Now, let's create a DataFrame:

Python
1import pandas as pd 2 3data = { 4 'Grocery Item': ['Apples', 'Oranges', 'Bananas', 'Grapes'], 5 'Price per kg': [3.25, 4.50, 2.75, 5.00] 6} 7 8grocery_df = pd.DataFrame(data) 9 10print(grocery_df) 11'''Output: 12 Grocery Item Price per kg 130 Apples 3.25 141 Oranges 4.50 152 Bananas 2.75 163 Grapes 5.00 17'''
Adding a Row to a DataFrame

Multiple scenarios might necessitate adding new entries to our DataFrame. Let's explore how to accomplish that:

In modern pandas, we use pd.concat() function to incorporate new rows. If you forgot to add 'Pears' to your grocery list, here’s how to do it:

Python
1new_row = pd.DataFrame({'Grocery Item': ['Pears'], 'Price per kg': [4.00]}) 2 3grocery_df = pd.concat([grocery_df, new_row]).reset_index(drop=True) 4 5print(grocery_df) 6'''Output: 7 Grocery Item Price per kg 80 Apples 3.25 91 Oranges 4.50 102 Bananas 2.75 113 Grapes 5.00 124 Pears 4.00 13'''

Setting reset_index(drop=True) resets the index to default integers. Without this step, pandas will save the original dataframes' indices, resulting in both 'Pears' and 'Apples' sharing the same index 0.

Adding Multiple Rows to a DataFrame

For multiple rows, you can concatenate them by creating a DataFrame and adding it to the original one:

Python
1new_rows = pd.DataFrame({ 2 'Grocery Item': ['Avocados', 'Blueberries'], 3 'Price per kg': [2.5, 10.0] 4}) 5 6grocery_df = pd.concat([grocery_df, new_rows]).reset_index(drop=True) 7 8print(grocery_df) 9'''Output: 10 Grocery Item Price per kg 110 Apples 3.25 121 Oranges 4.50 132 Bananas 2.75 143 Grapes 5.00 154 Avocados 2.50 165 Blueberries 10.00 17'''

You may wonder why we don't include these rows in the original dataframe. Well, it is only sometimes possible. Imagine we have two separate grocery lists coming from different sources, for instance, from separate files. In this case, the only way to combine them into one is to use pd.concat()

Removing Rows from a DataFrame

Frequently, we must delete rows from a DataFrame. To facilitate this, Pandas provides the drop() function. Suppose you want to remove 'Grapes' or both 'Apples' and 'Oranges' from your list. Here's how:

Python
1index_to_delete = grocery_df[grocery_df['Grocery Item'] == 'Grapes'].index 2 3grocery_df = grocery_df.drop(index_to_delete) 4 5print(grocery_df) 6'''Output: 7 Grocery Item Price per kg 80 Apples 3.25 91 Oranges 4.50 102 Bananas 2.75 11'''

Note that the .drop() method returns a new updated DataFrame instead of changing the original one. It allows you to modify the data while keeping its original state to return to it if necessary.

Removing Multiple Rows

There will be times when you will have to remove multiple rows in one go. For example, let's say you were informed that 'Apples' and 'Oranges' are out of stock, so you need to remove them from your grocery list. The drop() function allows you to do this too.

When removing multiple rows, we utilize the .isin() function, which checks if a value exists in a particular DataFrame column. You provide it with the values you want to remove, and it outputs the indices of those rows. Let's see it in action:

Python
1indices_to_delete = grocery_df[grocery_df['Grocery Item'].isin(['Apples', 'Oranges'])].index 2 3grocery_df = grocery_df.drop(indices_to_delete) 4 5print(grocery_df) 6'''Output: 7 Grocery Item Price per kg 82 Bananas 2.75 93 Grapes 5.00 10'''

In this block of code, the variable indices_to_delete holds the indices of the rows where the 'Grocery Item' is either 'Apples' or 'Oranges'. We then pass indices_to_delete to the drop() function, which removes the corresponding rows from the DataFrame.

Keep in mind, just as with removing a single row, the drop() function here doesn't change the original DataFrame. Instead, it returns a new DataFrame with the specified rows removed. This way, you can always revert back to the original data if needed.

Recap and Practice Announcement

Congratulations! You've now mastered adding and removing rows in a DataFrame, a crucial element in data manipulation. We discussed rows and their indexing and learned to add rows using pd.concat() and to remove them with drop(). Now, let's put this into practice! The upcoming exercises will enhance your data manipulation skills, enabling you to handle more complex operations on a DataFrame. Are you ready to give them a try?

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.