Welcome, dear learners! Today's focus is on mastering one of R's keystone skills — Boolean selection. This powerful tool in the data manipulation toolbox allows us to filter data, facilitating refined and targeted data wrangling.
Let's dissect what we mean by Boolean selection. In R, data frame elements are typically selected through their index
values. However, when you wish to filter rows based on conditions, the significance of Boolean selection shines through.
A Boolean vector, comprised of TRUE
or FALSE
values, determines which rows from a data frame we select. As you may have already guessed, these vectors are brought to life through logical operations on our data.
Consider this elementary example: finding numbers greater than 5 in a vector. Here's how you would accomplish it:
R1# Vector of numbers 2numbers <- c(2, 5, 7, 10) 3 4# Boolean vector for numbers > 5 5numbers_more_than_five <- numbers > 5 6 7# Print the Boolean vector 8print(numbers_more_than_five) # [1] FALSE FALSE TRUE TRUE
After running this code, we obtain a Boolean vector that indicates which values from numbers
exceed 5.
Let's expand this concept with a practical scenario provided by the mtcars
dataset. Let's print it:
R1print(mtcars)
1 mpg cyl disp hp drat wt qsec vs am gear carb 2Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 3Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 4Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 5Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 7...
Our task is to identify the cars that offer more than 20 MPG (miles per gallon) and have 6 or less cylinders. Here's how we can execute this operation:
R1# Boolean vector for cars with mpg > 20 2high_mpg_low_cyl_cars <- mtcars$mpg > 20 & mtcars$cyl <= 6 3 4# Filter the mtcars data frame 5mtcars_filtered <- mtcars[high_mpg_low_cyl_cars,] 6 7# Print the filtered data frame 8print(mtcars_filtered)
Voilà! We have successfully filtered the mtcars
data frame.
1 mpg cyl disp hp drat wt qsec vs am gear carb 2Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 3Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 4Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 5Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6...
Boolean selection can be akin to a double-edged sword if not wielded properly. Typical mishaps include mismatches between the sizes of the data frame and the Boolean vector, in addition to the notorious issues with NA
values in the data frame.
Ensure that the Boolean vector's length aligns with the data frame's row count. Be especially careful when handling NA
values!
Today's journey through the realm of Boolean selection in R has opened new doors in data selection. We've tackled succinct examples, pointed out potential pitfalls, and appreciated the application of this technique in real data frames.
Up next, we have engaging exercises for you to experiment with the Boolean selection concepts that you have just learned. Remember, practice is fundamental to mastering these concepts. So, put on your learning cap and roll up those sleeves!