Lesson 5

Basic Data Analysis

Topic Overview

Hey there! Curious about data's hidden secrets? Today, we dive into Basic Data Analysis—an essential step for data comprehension. We unearth patterns, and guide decision-making across various fields, be it business, science, or daily life, with a powerful tool—the pandas Python library. Let's embark on this journey!

Meaning of Basic Data Analysis

Rising to the challenge of solving a data mystery, Basic Data Analysis serves as the groundwork. It encompasses understanding and decision-making—be it a business owner understanding customer behavior, a scientist analyzing research data, or a student making sense of study material. With pandas, this process becomes effortless.

Using `value_counts()` for Frequency Analysis

Firstly, we employ value_counts(), a method that swiftly counts the frequency of DataFrame elements. Consider an imaginary dataset of pets.

1import pandas as pd 2 3# Creating DataFrame 4data = {'Name': ['Tommy', 'Rex', 'Bella', 'Charlie', 'Lucy', 'Cooper'], 5 'Type': ['Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Bird']} 6pets_df = pd.DataFrame(data)

Using the value_counts() function we can count count unique elements of a series (a dataframe column):

1print(pets_df['Type'].value_counts()) 2# Output: 3# Dog 3 4# Cat 2 5# Bird 1 6# Name: Type, dtype: int64

With value_counts(), establishing frequency distribution in series becomes straightforward.

Grouping and Aggregating with `groupby()` and `agg()` methods

For summarizing data, groupby() and agg() prove useful! Now let’s add weight to the pets in our DataFrame to illustrate these methods:

1import pandas as pd 2 3# Creating DataFrame 4data = {'Name': ['Tommy', 'Rex', 'Bella', 'Charlie', 'Lucy', 'Cooper'], 5 'Type': ['Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Bird'], 6 'Weight': [12, 15, 8, 9, 14, 1]} 7pets_df = pd.DataFrame(data)

Now group and calculate the mean of data based on pet type.

1print(pets_df.groupby('Type').agg({'Weight': 'mean'}))
  1. groupby('Type'): Splits the data into groups based on 'Type'.
  2. .agg({'Weight': 'mean'}): Applies the 'mean' function to the 'Weight' column for each group.

The resulting DataFrame shows the average weight for each pet type:

  • Bird: 1.0
  • Cat: 8.5
  • Dog: 13.67

Of course, calculating mean is not the only option. We can use functions like min, max, median, etc. We will talk more about using different aggregation functions in the next course.

Sorting DataFrame with `sort_values()`

Lastly, let's sort our data. The sort_values() function sorts our DataFrame as per one or many columns.

Let's arrange our pet DataFrame by pet weight.

1sorted_pets_df = pets_df.sort_values('Weight') 2print(sorted_pets_df) 3# Name Type Weight 4# 5 Cooper Bird 1 5# 2 Bella Cat 8 6# 3 Charlie Cat 9 7# 0 Tommy Dog 12 8# 4 Lucy Dog 14 9# 1 Rex Dog 15

We obtained sorted data efficiently with just one simple command!

Lesson Summary and Upcoming Practice

Great job! You've learned how to execute Basic Data Analysis using pandas functions. We explored value_counts(), groupby(), agg(), and sort_values().

Are these concepts a lot to digest? Don't worry! Exciting upcoming exercises will reinforce these concepts, so let's delve into practice. Remember, each accomplished task boosts your data analysis skills!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.