During today's session, we will delve into how to add and remove rows from a DataFrame in Pandas. These are vital tools for data manipulation, whether adding new entries or eliminating unnecessary data.
Consider it analogous to adding a name to your contacts or deleting an item from your shopping list. We will be carrying out similar operations but with a DataFrame. Let's begin:
Python1import pandas as pd
A DataFrame, a central data structure in Pandas, is a tool for storing data in table form. Each row contains values correlated to an individual entry in our data. For instance, each row of a grocery list might represent a unique grocery item.
Each row features an index, a unique identifier. Now, let's create a DataFrame:
Python1import pandas as pd 2 3data = { 4 'Grocery Item': ['Apples', 'Oranges', 'Bananas', 'Grapes'], 5 'Price per kg': [3.25, 4.50, 2.75, 5.00] 6} 7 8grocery_df = pd.DataFrame(data) 9 10print(grocery_df) 11'''Output: 12 Grocery Item Price per kg 130 Apples 3.25 141 Oranges 4.50 152 Bananas 2.75 163 Grapes 5.00 17'''
Multiple scenarios might necessitate adding new entries to our DataFrame. Let's explore how to accomplish that:
In modern pandas, we use pd.concat()
function to incorporate new rows. If you forgot to add 'Pears'
to your grocery list, here’s how to do it:
Python1new_row = pd.DataFrame({'Grocery Item': ['Pears'], 'Price per kg': [4.00]}) 2 3grocery_df = pd.concat([grocery_df, new_row]).reset_index(drop=True) 4 5print(grocery_df) 6'''Output: 7 Grocery Item Price per kg 80 Apples 3.25 91 Oranges 4.50 102 Bananas 2.75 113 Grapes 5.00 124 Pears 4.00 13'''
Setting reset_index(drop=True)
resets the index to default integers. Without this step, pandas will save the original dataframes' indices, resulting in both 'Pears'
and 'Apples'
sharing the same index 0
.
For multiple rows, you can concatenate them by creating a DataFrame and adding it to the original one:
Python1new_rows = pd.DataFrame({ 2 'Grocery Item': ['Avocados', 'Blueberries'], 3 'Price per kg': [2.5, 10.0] 4}) 5 6grocery_df = pd.concat([grocery_df, new_rows]).reset_index(drop=True) 7 8print(grocery_df) 9'''Output: 10 Grocery Item Price per kg 110 Apples 3.25 121 Oranges 4.50 132 Bananas 2.75 143 Grapes 5.00 154 Avocados 2.50 165 Blueberries 10.00 17'''
You may wonder why we don't include these rows in the original dataframe. Well, it is only sometimes possible. Imagine we have two separate grocery lists coming from different sources, for instance, from separate files. In this case, the only way to combine them into one is to use pd.concat()
Frequently, we must delete rows from a DataFrame. To facilitate this, Pandas provides the drop()
function. Suppose you want to remove 'Grapes'
or both 'Apples'
and 'Oranges'
from your list. Here's how:
Python1index_to_delete = grocery_df[grocery_df['Grocery Item'] == 'Grapes'].index 2 3grocery_df = grocery_df.drop(index_to_delete) 4 5print(grocery_df) 6'''Output: 7 Grocery Item Price per kg 80 Apples 3.25 91 Oranges 4.50 102 Bananas 2.75 11'''
Note that the .drop()
method returns a new updated DataFrame instead of changing the original one. It allows you to modify the data while keeping its original state to return to it if necessary.
There will be times when you will have to remove multiple rows in one go. For example, let's say you were informed that 'Apples'
and 'Oranges'
are out of stock, so you need to remove them from your grocery list. The drop()
function allows you to do this too.
When removing multiple rows, we utilize the .isin()
function, which checks if a value exists in a particular DataFrame column. You provide it with the values you want to remove, and it outputs the indices of those rows. Let's see it in action:
Python1indices_to_delete = grocery_df[grocery_df['Grocery Item'].isin(['Apples', 'Oranges'])].index 2 3grocery_df = grocery_df.drop(indices_to_delete) 4 5print(grocery_df) 6'''Output: 7 Grocery Item Price per kg 82 Bananas 2.75 93 Grapes 5.00 10'''
In this block of code, the variable indices_to_delete
holds the indices of the rows where the 'Grocery Item' is either 'Apples'
or 'Oranges'
. We then pass indices_to_delete
to the drop()
function, which removes the corresponding rows from the DataFrame.
Keep in mind, just as with removing a single row, the drop()
function here doesn't change the original DataFrame. Instead, it returns a new DataFrame with the specified rows removed. This way, you can always revert back to the original data if needed.
Congratulations! You've now mastered adding and removing rows in a DataFrame, a crucial element in data manipulation. We discussed rows and their indexing and learned to add rows using pd.concat()
and to remove them with drop().
Now, let's put this into practice! The upcoming exercises will enhance your data manipulation skills, enabling you to handle more complex operations on a DataFrame. Are you ready to give them a try?