Welcome back! Up until now, we've covered some essential techniques for managing your datasets with the dplyr
package in R. We've learned how to select specific columns, filter rows based on conditions, and summarize and group data. Now, it's time to take your data manipulation skills to the next level by learning how to mutate (or transform) data and arrange it in a specific order.
In this lesson, you'll explore two critical functionalities:
mutate
function.arrange
function.We'll use straightforward examples to make these concepts easy to grasp. Let’s dive into each of these functionalities.
First, we'll set up an example data frame that we'll use throughout this lesson:
R1# Example data frame 2data <- data.frame( 3 Name = c("Alice", "Bob", "Charlie", "David"), 4 Score = c(85, 95, 78, 92) 5) 6 7# Print the example data frame 8print(data) 9 10# Output: 11# Name Score 12# 1 Alice 85 13# 2 Bob 95 14# 3 Charlie 78 15# 4 David 92
This data frame contains the names of four individuals along with their respective scores.
The mutate
function allows us to add new columns or transform existing ones. For instance, suppose we want to add a new column, ScorePlus10
, which is each person's score incremented by 10.
R1# Example data frame 2data <- data.frame( 3 Name = c("Alice", "Bob", "Charlie", "David"), 4 Score = c(85, 95, 78, 92) 5) 6 7# Add a new column 8mutated_data <- mutate(data, ScorePlus10 = Score + 10) 9 10# Print the mutated data 11print(mutated_data) 12 13# Output: 14# Name Score ScorePlus10 15# 1 Alice 85 95 16# 2 Bob 95 105 17# 3 Charlie 78 88 18# 4 David 92 102
Here, mutate
adds a new column called ScorePlus10
to the data
frame, where each entry is the original Score
plus 10.
The arrange
function helps us sort the data in a specific order. For example, to sort the data by Score
in descending order, we can do the following:
R1# Example data frame 2data <- data.frame( 3 Name = c("Alice", "Bob", "Charlie", "David"), 4 Score = c(85, 95, 78, 92) 5) 6 7# Add a new column 8mutated_data <- mutate(data, ScorePlus10 = Score + 10) 9 10# Arrange data by Score in descending order 11arranged_data <- arrange(mutated_data, desc(Score)) 12 13# Print the descending arranged data 14print(arranged_data) 15 16# Output: 17# Name Score ScorePlus10 18# 1 Bob 95 105 19# 2 David 92 102 20# 3 Alice 85 95 21# 4 Charlie 78 88
In this code snippet, arrange
sorts the mutated_data
data frame by the Score
column in descending order.
To sort the data by Score
in ascending order, we can do the following:
R1# Example data frame 2data <- data.frame( 3 Name = c("Alice", "Bob", "Charlie", "David"), 4 Score = c(85, 95, 78, 92) 5) 6 7# Add a new column 8mutated_data <- mutate(data, ScorePlus10 = Score + 10) 9 10# Arrange data by Score in ascending order 11arranged_data_asc <- arrange(mutated_data, Score) 12 13# Print the ascending arranged data 14print(arranged_data_asc) 15 16# Output: 17# Name Score ScorePlus10 18# 1 Charlie 78 88 19# 2 Alice 85 95 20# 3 David 92 102 21# 4 Bob 95 105
Here, arrange
sorts the mutated_data
data frame by the Score
column in ascending order. No need to use any function for ascending order, as it is the default behavior of arrange
.
Mutating and arranging data are foundational skills in data wrangling.
Mutating Data: This technique allows you to create new variables or transform existing ones based on your needs. It's useful for tasks such as feature engineering in machine learning, where you may need to create new features from raw data.
Arranging Data: Sorting your data helps you see patterns more clearly and make your datasets more readable. For example, arranging sales data from the highest to the lowest can help you immediately spot your top-performing products.
By mastering these functions, you'll be better equipped to prepare your data for analysis and reporting, ensuring you draw more meaningful insights from your datasets.
Excited to start mutating and arranging data? Let's jump into the practice section and get hands-on with these powerful techniques.