Lesson 4

Today, we will explore sorting within a DataFrame using Python's `pandas`

. We will delve into the `sort_values()`

function, covering single and multi-column sorting and handling missing values.

Let's consider this small dataset containing statistics of basketball players:

Python`1import pandas as pd 2 3df = pd.DataFrame({ 4 'Player': ['L. James', 'K. Durant', 'M. Jordan', 'S. Curry', 'K. Bryant'], 5 'Points': [27.0, 26.0, 32.0, 24.0, 26.0], 6 'Assists': [5.7, 4.7, 4.2, 6.6, 7.4] 7})`

Note that there is a tie in `Points`

between `"K. Durant"`

and `"K. Bryant"`

.

We can sort DataFrame values using the `sort_values()`

function.

Python`1sorted_df = df.sort_values(by='Points', ascending=False) 2print(sorted_df) 3'''Output: 4 Player Points Assists 52 M. Jordan 32.0 4.2 60 L. James 27.0 5.7 71 K. Durant 26.0 4.7 84 K. Bryant 26.0 7.4 93 S. Curry 24.0 6.6 10'''`

In this example, the DataFrame is sorted by column `Points`

in descending order using the `by`

and `ascending`

parameters. Thus, we can list the most successful players in terms of average points scored.

In the previous example, pandas resolves a tie between `"K. Durant"`

and `"K. Bryant"`

by index, putting the player with the lower index first. Do you agree that a more reasonable decision would be to resolve ties by other players' characteristics – for example, putting players with higher `Assists`

scores first?

It is possible by providing multiple sorting columns.

Python`1sorted_df = df.sort_values(by=['Points', 'Assists'], ascending=False) 2print(sorted_df) 3'''Output: 4 Player Points Assists 52 M. Jordan 32.0 4.2 60 L. James 27.0 5.7 74 K. Bryant 26.0 7.4 81 K. Durant 26.0 4.7 93 S. Curry 24.0 6.6 10'''`

In this example, the DataFrame is sorted by column `Points`

and any ties are resolved by the column `Assists`

.

Let's alter the behavior and handle ties by sorting players alphabetically:

Python`1sorted_df = df.sort_values(by=['Points', 'Player'], ascending=False) 2print(sorted_df) 3'''Output: 4 Player Points Assists 52 M. Jordan 32.0 4.2 60 L. James 27.0 5.7 71 K. Durant 26.0 4.7 84 K. Bryant 26.0 7.4 93 S. Curry 24.0 6.6 10'''`

As `ascending=False`

, player names' sorting is also descending, which results in reverse alphabetical order. To fix it we can pass two values to `ascending`

, defining behavior of sorting differently for `'Points'`

and `'Player'`

:

Python`1sorted_df = df.sort_values(by=['Points', 'Player'], ascending=[False, True]) 2print(sorted_df) 3'''Output: 4 Player Points Assists 52 M. Jordan 32.0 4.2 60 L. James 27.0 5.7 74 K. Bryant 26.0 7.4 81 K. Durant 26.0 4.7 93 S. Curry 24.0 6.6 10'''`

`'Points'`

sorting is still in descending order, but `'Player'`

sorting is in ascending.

Great job! You've revisited pandas DataFrames, mastered data frame value sorting using the `sort_values()`

function, and learned about sorting by single or multiple columns. Now, get ready to hone these skills with some coding exercises. Happy coding!