Hello there! Today, our focus is the backbone of R programming: data frames. These store tabular data. By the end of this lesson, you'll be familiar with the construction and manipulation of data frames and reiterate their importance in your journey of data exploration.
A data frame in R holds data in a manner similar to how a picture frame holds a photo. They are two-dimensional, with each column accommodating one variable and each row containing one set of values from each column. This is like a combination of vectors and matrices, allowing for columns of varied data types.
Consider this scenario: you're hosting a party. A matrix can't store both your friends' names (which are characters) and their numbers (which are integers); a data frame, however, solves this problem aptly.
We construct data frames using R's data.frame()
function. Each column consists of a vector of values. Sticking with our party analogy, we create a data frame:
R1# Vectors for the party attendees 2friends <- c("Alice", "Bob", "Charlie") 3attend <- c("Yes", "No", "Yes") 4guests <- c(2, 0, 3) 5 6# Construct a data frame 7party_df <- data.frame(Friends=friends, Attending=attend, Guests=guests) 8 9# Inspect the data frame 10print(party_df)
The output is:
1 Friends Attending Guests 21 Alice Yes 2 32 Bob No 0 43 Charlie Yes 3
Providing explicit column names such as Friends=friends
, Attending=attend
, and Guests=guests
makes the data frame easy to understand.
To access a data frame's content, R uses names or indices and conditions. Data can be modified, and additional columns and rows can be added. For example:
R1# Access 1st column by index 2party_df[[1]] # [1] "Yes" "No" "Yes" 3 4# Access 'Friends' column by name 5party_df$Friends # [1] "Alice" "Bob" "Charlie" 6 7# Subset where attend is 'Yes' 8subset(party_df, Attending == 'Yes') 9# Friends Attending Guests 10# 1 Alice Yes 2 11# 3 Charlie Yes 3
We'll add another piece of information for our guests: their arrival times.
R1# Add the new column 2party_df$Arrival_Time <- c("6 PM", "N/A", "7 PM") 3 4# The updated data frame 5print(party_df)
The output is:
1 Friends Attending Guests Arrival_Time 21 Alice Yes 2 6 PM 32 Bob No 0 N/A 43 Charlie Yes 3 7 PM
We can also add columns using the cbind
function:
R1# Add a new column for departure times using cbind 2departure_times <- c("10 PM", "9 PM", "11 PM") 3 4# Combine party_df and the new column into a data frame 5party_df <- cbind(party_df, Departure_Time = departure_times) 6 7# The updated data frame 8print(party_df)
The output is:
1 Friends Attending Guests Arrival_Time Departure_Time 21 Alice Yes 2 6 PM 10 PM 32 Bob No 0 N/A 9 PM 43 Charlie Yes 3 7 PM 11 PM
The strength of data frames lies in their adaptability and robust data handling capabilities. Once data is inside them, the variation within can be leveraged to facilitate analysis. Imagine a data scientist at a sports firm who has details about their athletes, including names and nationalities (categorical data), as well as ages and scores (numerical data). Here, data frames are beneficial. They concentrate this range into one space, simplifying subsequent analysis.
Great work! Today, we focused on understanding R's data frames — their creation, manipulation, and vital role in R. Are you ready for some hands-on practice? Practice cements what we have learned, so roll up your sleeves and prepare for some upcoming tasks. Enjoy your practice!