Lesson 3
Grouping and Filtering Data Frames in R
Introduction

Today, we are approaching data analysis from a new angle by applying filtering to grouped data frames. We will revisit the grouping of data frames using the dplyr package and introduce filtering, illustrating these concepts with examples. By the end of this lesson, you will be well-versed in the skills needed to effectively group and filter data.

Recap of Grouping in dplyr

As a quick recap, dplyr is a highly influential package for data analysis in R, with key functions like group_by() at its core. Data frames, which are data tables that can be grouped using the group_by() function, are essential to our journey. Here is an example of grouping a data frame by the Product category:

R
1library(dplyr) 2library(dplyr) 3 4sales <- data.frame( 5 'Product' = c('Apple', 'Banana', 'Pear', 'Apple', 'Banana', 'Pear'), 6 'Store' = c('Store1', 'Store1', 'Store1', 'Store2', 'Store2', 'Store2'), 7 'Quantity' = c(20, 30, 40, 50, 60, 70) 8) 9 10grouped <- group_by(sales, Product) %>% summarize(avg_qnt = mean(Quantity)) 11print(grouped)

To show how the grouping works, we also calculate the mean quantity for each group, using the summarize function. The output is:

11 Apple 35 22 Banana 45 33 Pear 55
Filtering a Grouped DataFrame

To filter a grouped data frame in R, we utilize the filter() function, which is part of the dplyr package. For instance, let's retain products with a total quantity greater than 90.

R
1sales_grouped <- group_by(sales, Product) 2filtered_df <- sales_grouped %>% 3 filter(sum(Quantity) > 90) 4print(filtered_df)

This command filters the rows from the grouped data where the sum of Quantity exceeds 90. Pears are included in the output, as their combined quantity is 40 + 70 = 110:

11 Pear Store1 40 22 Pear Store2 70
Useful Filtering Operators

When filtering data frames in R using the filter() function, various operators help specify the conditions:

  • Relational Operators:

    • > : Greater than
    • < : Less than
    • >= : Greater than or equal to
    • <= : Less than or equal to
    • == : Equal to
    • != : Not equal to
  • Logical Operators:

    • & : AND, both conditions must be true
    • | : OR, either one of the conditions must be true
    • ! : NOT, negates the condition

These operators can be combined to build complex filtering criteria. For instance, filtering rows where Quantity is greater than 20 and Store is not 'Store1':

R
1filtered_sales <- filter(sales, Quantity > 20 & Store != 'Store1')
Lesson Summary

In this lesson, we have revisited the concept of grouping data frames in R using the dplyr package and introduced the idea of data filtering. We have learned to apply these skills to analyse data effectively. Practice exercises will further solidify these concepts and enhance your data manipulation prowess in R. Let's dive deeper with some hands-on learning!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.