Today, we will delve into the usefulness of sorting data within a DataFrame using R's data.table
or data.frame
. The focus will be on using the order()
function, covering both single and multi-column sorting and handling ties in our data.
Let's consider the following concise dataset of basketball players and their stats:
R1df <- data.frame( 2 Player = c('L. James', 'K. Durant', 'M. Jordan', 'S. Curry', 'K. Bryant'), 3 Points = c(27.0, 26.0, 32.0, 24.0, 26.0), 4 Assists = c(5.7, 4.7, 4.2, 6.6, 7.4) 5)
In this dataset, we observe a tie in the Points
column between 'K. Durant' and 'K. Bryant'.
We can sort the values in a DataFrame using the order()
function in R.
R1sorted_df <- df[order(-df$Points),] 2print(sorted_df)
1 Player Points Assists 23 M. Jordan 32 4.2 31 L. James 27 5.7 42 K. Durant 26 4.7 55 K. Bryant 26 7.4 64 S. Curry 24 6.6
This code sorts the DataFrame by the Points
column in descending order. The negative sign clarifies that the values are to be sorted in descending order. Also note a comma ,
after the order function: this comma is a part of the indexing. We index rows by order(-df$Points)
, and the columns index is empty, meaning we select all the columns.
Now, we can easily identify the players with the highest average points scored.
In instances of ties, R's order()
function enables us to distinguish tied values using additional parameters. Let's resolve the tie between 'K. Durant' and 'K. Bryant' using the Assists
column.
R1sorted_df <- df[order(-df$Points, -df$Assists),] 2print(sorted_df)
1 Player Points Assists 23 M. Jordan 32 4.2 31 L. James 27 5.7 45 K. Bryant 26 7.4 52 K. Durant 26 4.7 64 S. Curry 24 6.6
In this code, the DataFrame is sorted first by the Points
column, then by the Assists
column. The negative sign indicates descending order for both columns.
Instead of resolving ties based on the Assists
column, we can mix things up a bit by sorting alphabetically by player names in case of ties.
R1sorted_df <- df[order(-df$Points, df$Player),] 2print(sorted_df)
1 Player Points Assists 23 M. Jordan 32 4.2 31 L. James 27 5.7 45 K. Bryant 26 7.4 52 K. Durant 26 4.7 64 S. Curry 24 6.6
Here, the DataFrame is sorted by the Points
column in descending order and the player names in ascending order. Now, in the event of a tie in points, the players are listed alphabetically.
Fantastic work! You have deepened your understanding of data.frames
in R, mastered sorting data using the order()
function, and learned how single or multiple columns can be sorted. It's time to solidify your understanding by practicing with various datasets. Happy R programming!