Welcome to a new lesson! Today, we'll learn about basic statistical operations using Python's NumPy
library. These operations, including mean, median, mode, variance, and standard deviation, are vital tools for understanding and interpreting data. After learning each operation, we'll apply our understanding to a real-world dataset.
The mean, or average, is the sum of all values divided by the number of values. In Python, we use np.mean(array)
to calculate the mean. The median is the middle number in a sorted list, which can be calculated using np.median(array)
. The mode is the most frequent value in your data set, which can be calculated using the mode()
function from scipy
's stats
module.
Python1import numpy as np 2from scipy import stats 3 4grades = np.array([85, 87, 89, 82, 86, 80, 92, 80]) 5print("Mean:", np.mean(grades)) # Mean: 85.125 6print("Median:", np.median(grades)) # Median: 85.5 7print("Mode:", stats.mode(grades)) # Mode: ModeResult(mode=80, count=2)
Note that stats.mode
returns an object. In case of a tie, this object contains multiple items. To obtain the actual mode value, we can select the first one of these items like this:
Python1print("Mode:", stats.mode(grades)[0]) # Mode: 80
Variance measures the spread of data, and the standard deviation is the square root of variance. Use np.var(array)
and np.std(array)
to calculate them as shown below:
Python1print("Variance:", np.var(grades)) # Variance: 16.109375 2print("Standard Deviation:", np.std(grades)) # Standard Deviation: 4.01364
Congrats! You've learned basic statistical operations using NumPy
and applied them to a real-world dataset. In this lesson, we introduced the mean, median, mode, variance, and standard deviation and calculated them using NumPy
functions. Up next are some exercises to apply these techniques. Let's get practicing!