Unveiling Measures of Centrality: A Descriptive Statistics Journey with Python

Lesson 1

Introduction to Descriptive Statistics and Python

Greetings, data enthusiast! Today, we are diving into descriptive statistics using Python. We'll be exploring measures of centrality — mean, median, and mode — using Python libraries numpy and pandas.

Understanding Central Tendency

A central tendency finds a 'typical' value in a dataset. Our three components — the mean (average), median (mid-point), and mode (most frequently appearing) — each offer a unique perspective on centrality. The mean indicates average performance when decoding students' scores, while the median represents the middle student's performance, and the mode highlights the most common score.

Visualizing Central Tendency

This plot represents a given dataset's mean or centered location, also considered the 'average'. Imagine a seesaw balancing at its center - the mean of a dataset is where it balances out. It is a crucial statistical concept and visually helps identify where most of our data is centered around or leaning toward.

Setting up the Dataset

Our dataset is a list of individuals' ages: [23, 22, 22, 23, 24, 24, 23, 22, 21, 24, 23]. Remember, understanding your data upfront is key to conducting a meaningful analysis.

Computing Mean using Python

Calculating the mean involves adding all numbers together and then dividing by the count. Here's how you compute it in Python:

Python
1import numpy as np
2
3data = np.array([23, 22, 22, 23, 24, 24, 23, 22, 21, 24, 23])
4mean = np.mean(data)  # calculates the mean
5print("Mean: ", round(mean, 2))  # Mean:  22.82

Computing Median using Python

The median is the 'middle' value in an ordered dataset. This is how it is computed in Python:

Python
1import numpy as np
2
3data = np.array([23, 22, 22, 23, 24, 24, 23, 22, 21, 24, 23])
4median = np.median(data)  # calculates the median
5print("Median: ", median)  # Median:  23.0

Computing Mode using Python

The mode represents the most frequently occurring number(s) in a dataset. To compute it, we use the mode function from the scipy library:

Python
1from scipy import stats
2
3data = np.array([23, 22, 22, 23, 24, 24, 23, 22, 21, 24, 23])
4mode_age = stats.mode(data)  # calculates the mode
5print("Mode: ", mode_age.mode)  # Mode:  23

Note, that calculated mode_age is an object. To retrieve the actual value from it, we use the .mode attribute of this object. So, resulting line is mode_age.mode.

NumPy doesn't have a function for calculating mode, so we are using the SciPy module here. We will talk more about this module and its capabilities in the future lessons.

Handling Ties in Mode with `scipy`

Great job so far! Now let's explore an interesting concept: how the mode function from scipy.stats handles ties or duplicate modes.

So, what's a tie in mode? Imagine we have two or more different numbers appearing the same number of times in our dataset. For instance, consider this dataset: [20, 21, 21, 23, 23, 24]. Here, 21 and 23 both appear twice and are therefore modes.

Let's calculate the mode using scipy.stats:

Python
1from scipy import stats
2import numpy as np
3
4data = np.array([20, 21, 21, 23, 23, 24])
5mode = stats.mode(data)
6print("Mode: ", mode.mode)  # Mode: 21

Although 21 and 23 are both modes, our calculation only returned 21. Why is that?

In cases of ties, scipy.stats.mode() returns the smallest value amongst the tied modes. So in this case, it picked 21 over 23 because 21 is the smaller value.

Choice of Measure of Central Tendency

Your choice of measure of central tendency depends on the nature of your data. For numerical data, the mean is susceptible to outliers, i.e., extreme values, making the median a preferable measure. The mode is undefined when no particular value repeats, or all values repeat with equal frequency. For categorical data, the mode is the only meaningful measure.

Wrapping Up

Kudos! You have mastered the measures of central tendency and have learned how to compute them using Python! Stay tuned for some hands-on exercises for deeper reinforcement. Onward!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.