Descriptive Statistics

Lesson 2

Lesson Introduction

Welcome to our lesson on Descriptive Statistics! In this lesson, we will learn how to summarize and describe important features of a data set. By the end, you'll know how to calculate measures like the mean and standard deviation, which are crucial for understanding data in machine learning and many other fields.

Descriptive statistics help us get a quick overview of large amounts of data. Imagine understanding the average test scores of students or the variability in their heights. Descriptive statistics provide the tools for this efficiently.

Mean

The mean is the average of a set of numbers. Imagine test scores: 80, 85, 90, 75, and 95. To find the mean:

Add the scores: $80 + 85 + 90 + 75 + 95 = 425$
Divide by the number of scores: $\frac{425}{5} = 85$

The mean score is 85. It gives us the "central" value.

Standard Deviation

The standard deviation measures how spread out numbers are. For our test scores example:

Find the mean score: 85.
Subtract the mean from each score and square the result:
- $(80 - 85)^2 = 25$
- $(85 - 85)^2 = 0$
- $(90 - 85)^2 = 25$
- $(75 - 85)^2 = 100$
- $(95 - 85)^2 = 100$
Find the average of these squared differences:
- $\frac{25 + 0 + 25 + 100 + 100}{5} = 50$
Take the square root: $\sqrt{50} \approx 7.07$

A standard deviation of about 7.07 tells us the scores vary on average by 7.07 points from the mean. Low standard deviation means data points are close to the mean, while high indicates they are spread out.

Median

The median is another measure of central tendency that represents the middle value of a data set when it is arranged in order. It is particularly useful when the data set contains outliers, as the median is not affected by extreme values.

For example, consider the test scores: 75, 80, 85, 90, and 95. To find the median:

Arrange the scores in order: 75, 80, 85, 90, 95
Find the middle score: 85

If the data set contains an even number of values, the median is the average of the two middle numbers.

For instance, with scores: 75, 80, 85, and 95:

Arrange in order: 75, 80, 85, 95
Find the middle scores: 80 and 85
Calculate their average: $\frac{80 + 85}{2} = 82.5$

The median is useful in situations where the mean might be misleading due to outliers or skewed data distributions.

Calculating in Python

Let's see how to calculate these in Python using the NumPy library.

Here's a code snippet to calculate the mean, standard deviation, and median for a list of data:

Python
1# Calculating Mean, Standard Deviation, and Median
2import numpy as np
3
4data = [1.2, 2.3, 3.1, 4.5, 5.7]
5
6mean = np.mean(data)
7std_dev = np.std(data)
8median = np.median(data)
9
10print("Mean:", mean)
11print("Standard Deviation:", std_dev)
12print("Median:", median)

Plain text
1Mean: 3.36
2Standard Deviation: 1.589465318904442
3Median: 3.1

Note that the mean here will be slightly different from 3.36 due to the computational error.

Import NumPy: We start by importing the NumPy library for numerical operations.
Data Set: We create a list of data points: [1.2, 2.3, 3.1, 4.5, 5.7].
Calculate Mean: Using np.mean(data), NumPy calculates the average of the data points.
Calculate Standard Deviation: Using np.std(data), NumPy calculates how much the data points vary from the mean.
Calculate Median: Using np.median(data), NumPy finds the middle value of the data set.
Print Results: We print the calculated mean, standard deviation, and median.

Lesson Summary

In this lesson, we learned about descriptive statistics, focusing on the mean and standard deviation. These are essential tools for summarizing and understanding data sets.

We also saw how to calculate these values using Python, making it easier to handle large data sets.

Now it's your turn to practice! In the next session, you'll be given data sets to calculate the mean and standard deviation. This will help you reinforce what you've learned and apply these concepts to real data. Let's get started!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.