Hey there! Curious about data's hidden secrets? Today, we dive into Basic Data Analysis—an essential step for data comprehension. We unearth patterns, and guide decision-making across various fields, be it business, science, or daily life, with a powerful tool—the pandas
Python library. Let's embark on this journey!
Rising to the challenge of solving a data mystery, Basic Data Analysis serves as the groundwork. It encompasses understanding and decision-making—be it a business owner understanding customer behavior, a scientist analyzing research data, or a student making sense of study material. With pandas
, this process becomes effortless.
Firstly, we employ value_counts()
, a method that swiftly counts the frequency of DataFrame
elements. Consider an imaginary dataset of pets.
Python1import pandas as pd 2 3# Creating DataFrame 4data = {'Name': ['Tommy', 'Rex', 'Bella', 'Charlie', 'Lucy', 'Cooper'], 5 'Type': ['Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Bird']} 6pets_df = pd.DataFrame(data)
Using the value_counts()
function we can count count unique elements of a series (a dataframe column):
Python1print(pets_df['Type'].value_counts()) 2# Output: 3# Dog 3 4# Cat 2 5# Bird 1 6# Name: Type, dtype: int64
With value_counts()
, establishing frequency distribution in series becomes straightforward.
For summarizing data, groupby()
and agg()
prove useful! Now let’s add weight to the pets in our DataFrame to illustrate these methods:
Python1import pandas as pd 2 3# Creating DataFrame 4data = {'Name': ['Tommy', 'Rex', 'Bella', 'Charlie', 'Lucy', 'Cooper'], 5 'Type': ['Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Bird'], 6 'Weight': [12, 15, 8, 9, 14, 1]} 7pets_df = pd.DataFrame(data)
Now group and calculate the mean of data based on pet type.
Python1print(pets_df.groupby('Type').agg({'Weight': 'mean'}))
groupby('Type')
: Splits the data into groups based on 'Type'..agg({'Weight': 'mean'})
: Applies the 'mean' function to the 'Weight' column for each group.
The resulting DataFrame shows the average weight for each pet type:
- Bird: 1.0
- Cat: 8.5
- Dog: 13.67
Of course, calculating mean
is not the only option. We can use functions like min
, max
, median
, etc. We will talk more about using different aggregation functions in the next course.
Lastly, let's sort our data. The sort_values()
function sorts our DataFrame
as per one or many columns.
Let's arrange our pet DataFrame
by pet weight.
Python1sorted_pets_df = pets_df.sort_values('Weight') 2print(sorted_pets_df) 3# Name Type Weight 4# 5 Cooper Bird 1 5# 2 Bella Cat 8 6# 3 Charlie Cat 9 7# 0 Tommy Dog 12 8# 4 Lucy Dog 14 9# 1 Rex Dog 15
We obtained sorted data efficiently with just one simple command!
Great job! You've learned how to execute Basic Data Analysis using pandas
functions. We explored value_counts()
, groupby()
, agg()
, and sort_values()
.
Are these concepts a lot to digest? Don't worry! Exciting upcoming exercises will reinforce these concepts, so let's delve into practice. Remember, each accomplished task boosts your data analysis skills!