Lesson 3
Creating New Columns in Pandas
Introduction

Welcome to our session on creating new columns in Pandas. Today, we'll build on our data handling skills as we learn how to create new columns in our DataFrame. This ability is crucial for data cleaning and manipulation, enabling us to generate novel fields of data from our existing data.

By the end of this session, you'll be adept at adding new columns with static values, generating new columns through operations with existing columns, and creating new columns based on specific conditions.

Why Creating New Columns Is Important

Creating new columns is key for data analysis. Consider a DataFrame of prices and quantities of goods sold. We might want to get the total sales, which is price * quantity.

Python
1import pandas as pd 2 3# Creating DataFrame: items, prices, quantities sold 4df = pd.DataFrame({"Item": ["Apples", "Bananas", "Oranges"], "Price": [1.5, 0.5, 0.75], "Quantity": [10, 20, 30]}) 5# Create new column "Total" which is Price * Quantity 6df["Total"] = df["Price"] * df["Quantity"] 7print(df) 8# Item Price Quantity Total 9# 0 Apples 1.50 10 15.00 10# 1 Bananas 0.50 20 10.00 11# 2 Oranges 0.75 30 22.50

In this code, we create a new "Total" column. For dataframes, it works similarly to adding a new key to a dictionary: this easy!

Adding New Columns with Static Values

Adding a new column with a static value is quite simple. For example, adding a Location column for a group of employees working in the same location.

Python
1# Add "Location" column with static value 2df["Location"] = "New York" 3print(df) 4# Item Price Quantity Total Location 5# 0 Apples 1.50 10 15.00 New York 6# 1 Bananas 0.50 20 10.00 New York 7# 2 Oranges 0.75 30 22.50 New York
New Columns Based on Conditions

We can create new columns based on conditions from the values of the existing columns. For example, if we have a DataFrame of student scores, we can create a column that flags whether the student's score is above 40.

Here's how we can do this:

Python
1import numpy as np 2 3# DataFrame with student names and their scores 4df = pd.DataFrame({"Student": ["Alice", "Bob", "Charlie"], "Score": [42, 37, 56]}) 5# Create new column "Status" that is "Pass" if Score > 40 else "Fail" 6df["Status"] = np.where(df["Score"] > 40, "Pass", "Fail") 7print(df) 8# Student Score Status 9# 0 Alice 42 Pass 10# 1 Bob 37 Fail 11# 2 Charlie 56 Pass

The np.where function works as follows: it takes three arguments - a condition, a value to set when the condition is true, and a value to set when the condition is false. In this example, the condition is df["Score"] > 40. If this condition is true, the new column "Status" will have the value "Pass", otherwise it will have the value "Fail".

Lesson Summary and Upcoming Practice

So far, we've covered how to create new columns in a DataFrame with static values, through operations with existing columns, and based on conditions. The more you practice, the better your understanding will get. Looking forward to our exercise session!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.