Lesson 2

Exploring R Data Frames: A Beginner's Guide

Lesson Overview and Goals

Hello there! Today, our focus is the backbone of R programming: data frames. These store tabular data. By the end of this lesson, you'll be familiar with the construction and manipulation of data frames and reiterate their importance in your journey of data exploration.

Data Frames: A First Look

A data frame in R holds data in a manner similar to how a picture frame holds a photo. They are two-dimensional, with each column accommodating one variable and each row containing one set of values from each column. This is like a combination of vectors and matrices, allowing for columns of varied data types.

Consider this scenario: you're hosting a party. A matrix can't store both your friends' names (which are characters) and their numbers (which are integers); a data frame, however, solves this problem aptly.

Creating a Data Frame

We construct data frames using R's data.frame() function. Each column consists of a vector of values. Sticking with our party analogy, we create a data frame:

R
1# Vectors for the party attendees 2friends <- c("Alice", "Bob", "Charlie") 3attend <- c("Yes", "No", "Yes") 4guests <- c(2, 0, 3) 5 6# Construct a data frame 7party_df <- data.frame(Friends=friends, Attending=attend, Guests=guests) 8 9# Inspect the data frame 10print(party_df)

The output is:

1 Friends Attending Guests 21 Alice Yes 2 32 Bob No 0 43 Charlie Yes 3

Providing explicit column names such as Friends=friends, Attending=attend, and Guests=guests makes the data frame easy to understand.

Accessing Data

To access a data frame's content, R uses names or indices and conditions. Data can be modified, and additional columns and rows can be added. For example:

R
1# Access 1st column by index 2party_df[[1]] # [1] "Yes" "No" "Yes" 3 4# Access 'Friends' column by name 5party_df$Friends # [1] "Alice" "Bob" "Charlie" 6 7# Subset where attend is 'Yes' 8subset(party_df, Attending == 'Yes') 9# Friends Attending Guests 10# 1 Alice Yes 2 11# 3 Charlie Yes 3
Adding Columns

We'll add another piece of information for our guests: their arrival times.

R
1# Add the new column 2party_df$Arrival_Time <- c("6 PM", "N/A", "7 PM") 3 4# The updated data frame 5print(party_df)

The output is:

1 Friends Attending Guests Arrival_Time 21 Alice Yes 2 6 PM 32 Bob No 0 N/A 43 Charlie Yes 3 7 PM

We can also add columns using the cbind function:

R
1# Add a new column for departure times using cbind 2departure_times <- c("10 PM", "9 PM", "11 PM") 3 4# Combine party_df and the new column into a data frame 5party_df <- cbind(party_df, Departure_Time = departure_times) 6 7# The updated data frame 8print(party_df)

The output is:

1 Friends Attending Guests Arrival_Time Departure_Time 21 Alice Yes 2 6 PM 10 PM 32 Bob No 0 N/A 9 PM 43 Charlie Yes 3 7 PM 11 PM
The Power and Importance of Data Frames

The strength of data frames lies in their adaptability and robust data handling capabilities. Once data is inside them, the variation within can be leveraged to facilitate analysis. Imagine a data scientist at a sports firm who has details about their athletes, including names and nationalities (categorical data), as well as ages and scores (numerical data). Here, data frames are beneficial. They concentrate this range into one space, simplifying subsequent analysis.

Lesson Summary and Practice

Great work! Today, we focused on understanding R's data frames — their creation, manipulation, and vital role in R. Are you ready for some hands-on practice? Practice cements what we have learned, so roll up your sleeves and prepare for some upcoming tasks. Enjoy your practice!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.