Hello! Today we're delving deeper into Conditional Selection. As you might remember, it's a technique for selecting data in a data frame that meets given conditions. It's a key tool for data analysis as it allows us to focus on the most pertinent information.
In today's lesson, we're going to look at more complex conditional selection scenarios and learn about an important method, where()
. Our journey will start with a short refresher on conditional selection, move on to more sophisticated compound conditions, and finally, we'll dive into the where()
method. Let's get started!
Before we venture into uncharted territory, let's refresh our memory on conditional selection. Essentially, with conditional selection, we're requesting Python to sift through our data and return elements that meet our stipulations. We do this by comparing columns or rows of our data frame against certain conditions.
For instance, we have a pandas data frame scores_df
consisting of a list of students' names and their test scores.
Python1import pandas as pd 2 3data = {'Name': ['Alice', 'Bob', 'Charlie', 'Dave'], 'Score': [88, 92, 95, 80]} 4scores_df = pd.DataFrame(data) 5 6print(scores_df) 7# Name Score 8# 0 Alice 88 9# 1 Bob 92 10# 2 Charlie 95 11# 3 Dave 80
Let's find out who scored more than 90:
Python1print(scores_df[scores_df['Score'] > 90]) 2# Name Score 3# 1 Bob 92 4# 2 Charlie 95
By using 'Score' > 90
, we've created a mask and used it to filter rows that resolve to True
. Pretty cool, right?
In real-world scenarios, it might be necessary to select data based on more than one condition. In these cases, we would need to deploy compound conditions.
Here we introduce two operators — &
(and) and |
(or). &
insists that all conditions must be true, and |
requires any condition to be true.
Interestingly, we can negate a condition using ~
(not).
Make sure to place your conditions in parentheses when using &
(and) or |
(or). This is required in Python to ensure that the conditions are evaluated before the conjunction is done.
Consider this example:
Python1print(scores_df[(scores_df['Score'] > 85) & (scores_df['Name'].str.startswith('A'))]) 2# Name Score 3# 0 Alice 88
And there's Alice! She scored more than 85 and her name starts with an 'A'.
Now, what if we want all students except for 'Bob'. Simply employ ~
:
Python1print(scores_df[~(scores_df['Name'] == 'Bob')]) 2# Name Score 3# 0 Alice 88 4# 2 Charlie 95 5# 3 Dave 80
Adios, Bob!
Next up is the where()
method from pandas. This method is useful when you want to select data but replace the data that doesn't satisfy the condition with a custom value rather than simply discarding them.
For instance, consider the example where scores less than 85 are replaced with 'Fail'
:
Python1print(scores_df['Score'].where(scores_df['Score'] > 85, other='Fail')) 2# 0 88 3# 1 92 4# 2 95 5# 3 Fail 6# Name: Score, dtype: object
The records which didn't meet the condition were replaced with 'Fail'
. How handy is that!
Hold up! We've covered a significant portion in this stage of our adventure. We revisited what conditional selection is, learned about compound conditions, and got to know the where()
method. Great going!
Watching tennis is fun, but it's even better when you're out there playing. The same applies to coding! Engage in some awesome practice problem-solving sessions to solidify your understanding of complex conditional selection and the application of where()
. Every problem solved is a step closer to becoming an efficient data analyst. On your marks, get set, code!