Lesson 4

Today, we will delve into the usefulness of sorting data within a DataFrame using R's `data.table`

or `data.frame`

. The focus will be on using the `order()`

function, covering both single and multi-column sorting and handling ties in our data.

Let's consider the following concise dataset of basketball players and their stats:

R`1df <- data.frame( 2 Player = c('L. James', 'K. Durant', 'M. Jordan', 'S. Curry', 'K. Bryant'), 3 Points = c(27.0, 26.0, 32.0, 24.0, 26.0), 4 Assists = c(5.7, 4.7, 4.2, 6.6, 7.4) 5)`

In this dataset, we observe a tie in the `Points`

column between **'K. Durant'** and **'K. Bryant'**.

We can sort the values in a DataFrame using the `order()`

function in R.

R`1sorted_df <- df[order(-df$Points),] 2print(sorted_df)`

`1 Player Points Assists 23 M. Jordan 32 4.2 31 L. James 27 5.7 42 K. Durant 26 4.7 55 K. Bryant 26 7.4 64 S. Curry 24 6.6`

This code sorts the DataFrame by the `Points`

column in descending order. The negative sign clarifies that the values are to be sorted in descending order. Also note a comma `,`

after the order function: this comma is a part of the indexing. We index rows by `order(-df$Points)`

, and the columns index is empty, meaning we select all the columns.

Now, we can easily identify the players with the highest average points scored.

In instances of ties, R's `order()`

function enables us to distinguish tied values using additional parameters. Let's resolve the tie between 'K. Durant' and 'K. Bryant' using the `Assists`

column.

R`1sorted_df <- df[order(-df$Points, -df$Assists),] 2print(sorted_df)`

`1 Player Points Assists 23 M. Jordan 32 4.2 31 L. James 27 5.7 45 K. Bryant 26 7.4 52 K. Durant 26 4.7 64 S. Curry 24 6.6`

In this code, the DataFrame is sorted first by the `Points`

column, then by the `Assists`

column. The negative sign indicates descending order for both columns.

Instead of resolving ties based on the `Assists`

column, we can mix things up a bit by sorting alphabetically by player names in case of ties.

R`1sorted_df <- df[order(-df$Points, df$Player),] 2print(sorted_df)`

`1 Player Points Assists 23 M. Jordan 32 4.2 31 L. James 27 5.7 45 K. Bryant 26 7.4 52 K. Durant 26 4.7 64 S. Curry 24 6.6`

Here, the DataFrame is sorted by the `Points`

column in descending order and the player names in ascending order. Now, in the event of a tie in points, the players are listed alphabetically.

Fantastic work! You have deepened your understanding of `data.frames`

in R, mastered sorting data using the `order()`

function, and learned how single or multiple columns can be sorted. It's time to solidify your understanding by practicing with various datasets. Happy R programming!