Lesson 3

Hey there! Today, we're diving into sorting and ranking data. These techniques help us organize and see patterns in our data, making it easier to analyze and draw conclusions. By the end of this lesson, you'll know how to sort data in different ways and rank data within groups using **Pandas**. We'll be working with the **Titanic** dataset, the same one we've used in previous lessons.

Sorting and ranking might sound a bit technical, but think of it like sorting your favorite toy collection by size or ranking your friends by age. It’s all about making data neat and meaningful!

Sorting data means arranging it in a specific order, like alphabetizing words in a dictionary or listing numbers from smallest to largest. Let's start with some basics. Here's how to sort data by a single column. Suppose we want to sort passengers by how much they paid for their tickets (`fare`

).

Python`1import seaborn as sns 2 3# Load the Titanic dataset 4titanic = sns.load_dataset('titanic') 5 6# Sort by fare in descending order 7titanic_sorted = titanic.sort_values(by='fare', ascending=False) 8print(titanic_sorted[['fare', 'class']].head())`

Output:

`1 fare class 2258 512.3292 First 3680 512.3292 First 4737 262.3750 First 527 263.0000 First 6311 262.3750 First`

Here, our data is sorted by `fare`

in the descending order. We control it using `by`

and `ascending`

arguments of the `sort_values`

function.

Imagine you're a librarian organizing books. Sorting helps you find books faster. Similarly, sorting data helps analysts focus on key information quickly, like the highest sales or the oldest customers.

Ranking data means assigning a rank (like 1st, 2nd, 3rd) to items in your data based on their values. Let's use a simple dataset to make this clearer. Below is a small dataset of students and their scores.

Python`1import pandas as pd 2 3# Sample dataset 4data = { 5 'student': ['Alice', 'Bob', 'Charlie', 'David'], 6 'score': [88, 92, 85, 92] 7} 8students = pd.DataFrame(data) 9print(students)`

Output:

`1 student score 20 Alice 88 31 Bob 92 42 Charlie 85 53 David 92`

Now, let's see how to rank students by their scores.

Python`1# Rank students by their score 2students['score_rank'] = students['score'].rank(method='average', ascending=True) 3print(students)`

Output:

`1 student score score_rank 20 Alice 88 2.0 31 Bob 92 3.5 42 Charlie 85 1.0 53 David 92 3.5`

This table clearly shows how the students' scores are ranked. For example, `Charlie`

has the lowest score, and he is ranked `1`

. `Bob`

and `David`

have a tie – they both hold the score of `92`

. We specified the tie handling method `average`

. As `Bob`

and `David`

share ranks `3`

and `4`

, their average rank is `3.5`

.

There are different methods of sorting ties in ranking:

`average`

: Ranks are averaged if there are ties.`min`

: The smallest rank is assigned to all ties.`max`

: The largest rank is assigned to all ties.`first`

: Ranks are assigned in the order they appear.`dense`

: Like`min`

, but the rank of the next group is just one more than the previous group.

For example:

Python`1# Rank by score, using the 'min' method 2students['score_rank_min'] = students['score'].rank(method='min') 3print(students)`

Output:

`1 student score score_rank score_rank_min 20 Alice 88 2.0 2.0 31 Bob 92 3.5 3.0 42 Charlie 85 1.0 1.0 53 David 92 3.5 3.0`

In this case, `Bob`

and `David`

, sharing ranks `3`

and `4`

, were assigned the minimum rank – `3`

.

In this lesson, we learned the importance of sorting and ranking data to organize and extract meaningful insights from it. We covered how to:

- Sort data by a single column using
`.sort_values()`

. - Rank data within groups using
`.rank()`

and explored different ranking methods like`average`

and`min`

.

Now that you've grasped the theory, it's time to put your skills to the test! In the upcoming practice session, you'll use what you've learned to sort and rank data in various ways. This hands-on practice will help solidify your understanding and make you more comfortable with these essential data manipulation techniques.

Happy coding!