Hey there! Today, we're diving into sorting and ranking data. These techniques help us organize and see patterns in our data, making it easier to analyze and draw conclusions. By the end of this lesson, you'll know how to sort data in different ways and rank data within groups using Pandas. We'll be working with the Titanic dataset, the same one we've used in previous lessons.
Sorting and ranking might sound a bit technical, but think of it like sorting your favorite toy collection by size or ranking your friends by age. It’s all about making data neat and meaningful!
Sorting data means arranging it in a specific order, like alphabetizing words in a dictionary or listing numbers from smallest to largest. Let's start with some basics. Here's how to sort data by a single column. Suppose we want to sort passengers by how much they paid for their tickets (fare
).
Python1import seaborn as sns 2 3# Load the Titanic dataset 4titanic = sns.load_dataset('titanic') 5 6# Sort by fare in descending order 7titanic_sorted = titanic.sort_values(by='fare', ascending=False) 8print(titanic_sorted[['fare', 'class']].head())
Output:
1 fare class 2258 512.3292 First 3680 512.3292 First 4737 262.3750 First 527 263.0000 First 6311 262.3750 First
Here, our data is sorted by fare
in the descending order. We control it using by
and ascending
arguments of the sort_values
function.
Imagine you're a librarian organizing books. Sorting helps you find books faster. Similarly, sorting data helps analysts focus on key information quickly, like the highest sales or the oldest customers.
Ranking data means assigning a rank (like 1st, 2nd, 3rd) to items in your data based on their values. Let's use a simple dataset to make this clearer. Below is a small dataset of students and their scores.
Python1import pandas as pd 2 3# Sample dataset 4data = { 5 'student': ['Alice', 'Bob', 'Charlie', 'David'], 6 'score': [88, 92, 85, 92] 7} 8students = pd.DataFrame(data) 9print(students)
Output:
1 student score 20 Alice 88 31 Bob 92 42 Charlie 85 53 David 92
Now, let's see how to rank students by their scores.
Python1# Rank students by their score 2students['score_rank'] = students['score'].rank(method='average', ascending=True) 3print(students)
Output:
1 student score score_rank 20 Alice 88 2.0 31 Bob 92 3.5 42 Charlie 85 1.0 53 David 92 3.5
This table clearly shows how the students' scores are ranked. For example, Charlie
has the lowest score, and he is ranked 1
. Bob
and David
have a tie – they both hold the score of 92
. We specified the tie handling method average
. As Bob
and David
share ranks 3
and 4
, their average rank is 3.5
.
There are different methods of sorting ties in ranking:
average
: Ranks are averaged if there are ties.min
: The smallest rank is assigned to all ties.max
: The largest rank is assigned to all ties.first
: Ranks are assigned in the order they appear.dense
: Like min
, but the rank of the next group is just one more than the previous group.For example:
Python1# Rank by score, using the 'min' method 2students['score_rank_min'] = students['score'].rank(method='min') 3print(students)
Output:
1 student score score_rank score_rank_min 20 Alice 88 2.0 2.0 31 Bob 92 3.5 3.0 42 Charlie 85 1.0 1.0 53 David 92 3.5 3.0
In this case, Bob
and David
, sharing ranks 3
and 4
, were assigned the minimum rank – 3
.
In this lesson, we learned the importance of sorting and ranking data to organize and extract meaningful insights from it. We covered how to:
.sort_values()
..rank()
and explored different ranking methods like average
and min
.Now that you've grasped the theory, it's time to put your skills to the test! In the upcoming practice session, you'll use what you've learned to sort and rank data in various ways. This hands-on practice will help solidify your understanding and make you more comfortable with these essential data manipulation techniques.
Happy coding!