Lesson 4
Combining and Reshaping Data
Combining and Reshaping Data

Welcome back! You've been doing an excellent job mastering data manipulation using the dplyr package. So far, we've covered selecting, filtering, summarizing, and mutating data. Now, it's time to step up your skills further by combining and reshaping datasets. This lesson will give you powerful tools to handle complex data structures more efficiently.

What You'll Learn

We will dive into two new essential techniques:

  1. Combining Data: Merging multiple data frames into one cohesive dataset using the bind_rows function.
  2. Reshaping Data: Changing the layout of your data for easier analysis using functions like gather from the tidyr package.

Let's look at these techniques step-by-step.

Example Data Frames

First, let's create example data frames that we'll use for combining and reshaping.

R
1# Example data frames 2df1 <- data.frame(ID = 1:2, Name = c("Alice", "Bob")) 3df2 <- data.frame(ID = 3:4, Name = c("Charlie", "David")) 4 5# Print the example data frames 6print(df1) 7print(df2) 8 9# Output of df1: 10# ID Name 11# 1 1 Alice 12# 2 2 Bob 13 14# Output of df2: 15# ID Name 16# 1 3 Charlie 17# 2 4 David

Here, we have two data frames df1 and df2 with a column for ID and Name.

Combining Data Frames by Rows

We can combine these two data frames into one using the bind_rows function from dplyr.

R
1# Example data frames 2df1 <- data.frame(ID = 1:2, Name = c("Alice", "Bob")) 3df2 <- data.frame(ID = 3:4, Name = c("Charlie", "David")) 4 5# Combining data frames by rows 6combined_data <- bind_rows(df1, df2) 7 8# Print the combined data frame 9print(combined_data) 10 11# Output of combined_data: 12# ID Name 13# 1 1 Alice 14# 2 2 Bob 15# 3 3 Charlie 16# 4 4 David

The bind_rows function stacks the rows of df1 and df2 to create a new data frame combined_data.

Reshape Data Using Gather

Now, let's reshape our combined data frame using the gather function from the tidyr package. The gather function converts wide-format data to long-format data. In our case, it will transform the data such that each row represents a single observation.

R
1# Example data frames 2df1 <- data.frame(ID = 1:2, Name = c("Alice", "Bob")) 3df2 <- data.frame(ID = 3:4, Name = c("Charlie", "David")) 4 5# Combining data frames by rows 6combined_data <- bind_rows(df1, df2) 7 8# Reshape data using gather 9gathered_data <- gather(combined_data, key = "Variable", value = "Value", -ID) 10 11# Display the reshaped data 12print(gathered_data) 13 14# Output of gathered_data: 15# ID Variable Value 16# 1 1 Name Alice 17# 2 2 Name Bob 18# 3 3 Name Charlie 19# 4 4 Name David

Here, gathered_data will be the reshaped version of combined_data, where the columns (except ID) are gathered into key-value pairs.

The gather function changes data from a wide format to a long format.

  • Wide Format: Each variable is in a separate column. For example, in combined_data, we have the columns ID and Name, where each row represents a unique ID and its associated name.
  • Long Format: Each observation is in a separate row. The gather function consolidates multiple columns into key-value pairs, making it easier to perform certain types of analysis.

In gather:

  • key becomes the new column name that holds the original column names.
  • value becomes the column name for the values.
  • -column_name tells gather to exclude the column_name column from gathering.
Why It Matters

Combining and reshaping data are fundamental tasks in data analysis. Here’s why they matter:

  • Combining Data: Often, data is stored in different tables or data frames. Combining these various sources into a single dataset lets you perform comprehensive analyses. Whether integrating customer data from different departments or merging quarterly reports, combining data ensures all necessary information is in one place.

  • Reshaping Data: Different analysis tasks may require data in different formats. For example, pivoting long data into a wide format (or vice versa) can make it easier to perform calculations or visualizations. Reshaping data helps tailor your dataset to meet specific analytical requirements.

Mastering these techniques will make your data manipulation efforts more flexible and effective. Ready to combine and reshape data? Let’s jump into the practice section and start applying these new skills!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.