Lesson 6

Welcome to a new lesson! Today, we'll learn about basic *statistical operations* using Python's `NumPy`

library. These operations, including **mean**, **median**, **mode**, **variance**, and **standard deviation**, are vital tools for understanding and interpreting data. After learning each operation, we'll apply our understanding to a real-world dataset.

The **mean**, or average, is the sum of all values divided by the number of values. In Python, we use `np.mean(array)`

to calculate the mean. The **median** is the middle number in a sorted list, which can be calculated using `np.median(array)`

. The **mode** is the most frequent value in your data set, which can be calculated using the `mode()`

function from `scipy`

's `stats`

module.

Python`1import numpy as np 2from scipy import stats 3 4grades = np.array([85, 87, 89, 82, 86, 80, 92, 80]) 5print("Mean:", np.mean(grades)) # Mean: 85.125 6print("Median:", np.median(grades)) # Median: 85.5 7print("Mode:", stats.mode(grades)) # Mode: ModeResult(mode=80, count=2)`

Note that `stats.mode`

returns an object. In case of a tie, this object contains multiple items. To obtain the actual mode value, we can select the first one of these items like this:

Python`1print("Mode:", stats.mode(grades)[0]) # Mode: 80`

**Variance** measures the spread of data, and the **standard deviation** is the square root of variance. Use `np.var(array)`

and `np.std(array)`

to calculate them as shown below:

Python`1print("Variance:", np.var(grades)) # Variance: 16.109375 2print("Standard Deviation:", np.std(grades)) # Standard Deviation: 4.01364`

Congrats! You've learned basic statistical operations using `NumPy`

and applied them to a real-world dataset. In this lesson, we introduced the **mean**, **median**, **mode**, **variance**, and **standard deviation** and calculated them using `NumPy`

functions. Up next are some exercises to apply these techniques. Let's get practicing!