Today, we are approaching data analysis from a new angle by applying filtering to grouped data frames. We will revisit the grouping of data frames using the dplyr
package and introduce filtering, illustrating these concepts with examples. By the end of this lesson, you will be well-versed in the skills needed to effectively group and filter data.
As a quick recap, dplyr
is a highly influential package for data analysis in R, with key functions like group_by()
at its core. Data frames, which are data tables that can be grouped using the group_by()
function, are essential to our journey. Here is an example of grouping a data frame by the Product
category:
R1library(dplyr) 2library(dplyr) 3 4sales <- data.frame( 5 'Product' = c('Apple', 'Banana', 'Pear', 'Apple', 'Banana', 'Pear'), 6 'Store' = c('Store1', 'Store1', 'Store1', 'Store2', 'Store2', 'Store2'), 7 'Quantity' = c(20, 30, 40, 50, 60, 70) 8) 9 10grouped <- group_by(sales, Product) %>% summarize(avg_qnt = mean(Quantity)) 11print(grouped)
To show how the grouping works, we also calculate the mean quantity for each group, using the summarize
function. The output is:
11 Apple 35 22 Banana 45 33 Pear 55
To filter a grouped data frame in R, we utilize the filter()
function, which is part of the dplyr
package. For instance, let's retain products with a total quantity greater than 90
.
R1sales_grouped <- group_by(sales, Product) 2filtered_df <- sales_grouped %>% 3 filter(sum(Quantity) > 90) 4print(filtered_df)
This command filters the rows from the grouped data where the sum of Quantity
exceeds 90
. Pears are included in the output, as their combined quantity is 40 + 70 = 110
:
11 Pear Store1 40 22 Pear Store2 70
When filtering data frames in R using the filter()
function, various operators help specify the conditions:
-
Relational Operators:
>
: Greater than<
: Less than>=
: Greater than or equal to<=
: Less than or equal to==
: Equal to!=
: Not equal to
-
Logical Operators:
&
: AND, both conditions must be true|
: OR, either one of the conditions must be true!
: NOT, negates the condition
These operators can be combined to build complex filtering criteria. For instance, filtering rows where Quantity
is greater than 20
and Store
is not 'Store1':
R1filtered_sales <- filter(sales, Quantity > 20 & Store != 'Store1')
In this lesson, we have revisited the concept of grouping data frames in R using the dplyr
package and introduced the idea of data filtering. We have learned to apply these skills to analyse data effectively. Practice exercises will further solidify these concepts and enhance your data manipulation prowess in R. Let's dive deeper with some hands-on learning!