Welcome to our lesson on Descriptive Statistics! In this lesson, we will learn how to summarize and describe important features of a data set. By the end, you'll know how to calculate measures like the mean
and standard deviation
, which are crucial for understanding data in machine learning and many other fields.
Descriptive statistics help us get a quick overview of large amounts of data. Imagine understanding the average test scores of students or the variability in their heights. Descriptive statistics provide the tools for this efficiently.
The mean is the average of a set of numbers. Imagine test scores: 80, 85, 90, 75, and 95. To find the mean:
The mean score is 85. It gives us the "central" value.
The standard deviation measures how spread out numbers are. For our test scores example:
A standard deviation of about 7.07 tells us the scores vary on average by 7.07 points from the mean. Low standard deviation means data points are close to the mean, while high indicates they are spread out.
The median is another measure of central tendency that represents the middle value of a data set when it is arranged in order. It is particularly useful when the data set contains outliers, as the median is not affected by extreme values.
For example, consider the test scores: 75, 80, 85, 90, and 95. To find the median:
If the data set contains an even number of values, the median is the average of the two middle numbers.
For instance, with scores: 75, 80, 85, and 95:
The median is useful in situations where the mean might be misleading due to outliers or skewed data distributions.
Let's see how to calculate these in Python using the NumPy
library.
Here's a code snippet to calculate the mean, standard deviation, and median for a list of data:
Python1# Calculating Mean, Standard Deviation, and Median 2import numpy as np 3 4data = [1.2, 2.3, 3.1, 4.5, 5.7] 5 6mean = np.mean(data) 7std_dev = np.std(data) 8median = np.median(data) 9 10print("Mean:", mean) 11print("Standard Deviation:", std_dev) 12print("Median:", median)
Plain text1Mean: 3.36 2Standard Deviation: 1.589465318904442 3Median: 3.1
Note that the mean here will be slightly different from 3.36
due to the computational error.
NumPy
library for numerical operations.[1.2, 2.3, 3.1, 4.5, 5.7]
.np.mean(data)
, NumPy
calculates the average of the data points.np.std(data)
, NumPy
calculates how much the data points vary from the mean.np.median(data)
, NumPy
finds the middle value of the data set.In this lesson, we learned about descriptive statistics, focusing on the mean
and standard deviation
. These are essential tools for summarizing and understanding data sets.
We also saw how to calculate these values using Python, making it easier to handle large data sets.
Now it's your turn to practice! In the next session, you'll be given data sets to calculate the mean and standard deviation. This will help you reinforce what you've learned and apply these concepts to real data. Let's get started!