Lesson 3

Today, we're approaching data analysis from a new angle by applying filtering to grouped DataFrames. We will review DataFrame grouping and introduce filtering, illustrating these concepts with examples. By the end of this lesson, you will be equipped with the necessary skills to effectively **group and filter data**.

As a quick recap, `pandas`

is a highly influential Python module for data analysis, with powerful classes such as **DataFrames** at its core. DataFrames are data tables, and you can group the data within them using the `groupby()`

function. Here is an example of grouping data within a DataFrame by `'Product'`

:

Python`1import pandas as pd 2 3sales = pd.DataFrame({ 4 'Product': ['Apple', 'Banana', 'Pear', 'Apple', 'Banana', 'Pear'], 5 'Store': ['Store1', 'Store1', 'Store1', 'Store2', 'Store2', 'Store2'], 6 'Quantity': [20, 30, 40, 50, 60, 70] 7}) 8 9grouped = sales.groupby('Product') 10print(grouped.get_group('Apple')) # printing one group for an example 11'''Output: 12 Product Store Quantity 130 Apple Store1 20 143 Apple Store2 50 15'''`

To filter grouped data, we will need functions. Let's recall how to easily create and use them.

In Python, **lambda** functions are small anonymous functions. They can take any number of arguments but only have one expression.

Consider a situation where we use a function to calculate the total price after adding the sales tax. In a place where the sales tax is 10%, the function to calculate the total cost could look like:

**Regular Function**

Python`1def add_sales_tax(amount): 2 return amount + (amount * 0.10) 3 4print(add_sales_tax(100)) # 110`

Replacing the function with a compact lambda function is handy when it is simple and not used repeatedly. The syntax for lambda is `lambda var: expression`

, where `var`

is the function's input variable and `expression`

is what this function returns.

The above `add_sales_tax`

function can be replaced with a lambda function as follows:

**Lambda Function**

Python`1add_sales_tax = lambda amount : amount + (amount * 0.10) 2print(add_sales_tax(100)) #110`

Lambda functions are handy when used inside other functions or as arguments in functions like `filter()`

, `map()`

etc.

A Boolean Lambda function always returns either `True`

or `False`

. Let's imagine a case where we want to know whether a number is even or odd. We can easily accomplish this using a Boolean Lambda function.

Here's how we can define such a function:

Python`1is_even = lambda num: num % 2 == 0 2print(is_even(10)) # True`

The preceding two lines of code create a Boolean Lambda function named `is_even`

. This function takes a number (named `num`

) as an argument, divides it by `2`

, and then checks if the remainder is `0`

. It returns the condition's value, either `True`

or `False`

.

Boolean lambda functions are fantastic tools for quickly evaluating a condition. Their applications are broad, especially when you're manipulating data with pandas. They can be used in various ways, including sorting, filtering, and mapping.

Boolean selection does not apply to grouped dataframes. Instead, we use the `filter()`

function, which takes a boolean function as an argument. For instance, let's keep products with a summary quantity greater than `90`

.

Python`1grouped = sales.groupby('Product') 2filtered_df = grouped.filter(lambda x: x['Quantity'].sum() > 90) 3print(filtered_df) 4'''Output: 5 Product Store Quantity 62 Pear Store1 40 75 Pear Store2 70 8'''`

This command yields the rows from the grouped data where the sum of `Quantity`

exceeds `90`

. Pears are included, as their summary quantity is `40 + 70 = 110`

.

In summary, we have explored **DataFrame grouping** and **data filtering**, and how to apply these techniques in data analysis. Practice exercises will solidify this knowledge and enhance your confidence. So, let's dive into some hands-on learning!