Mastering Statistical Computations with Scipy

Descriptive and Inferential Statistics with PythonLesson 4

Lesson 4

Mastering Statistical Computations with Scipy

Introduction and Topic Overview

Welcome! Today, we're going to explore Scipy, a library in Python designed for advanced mathematical and statistical computing—an extension of Numpy. One of the most significant advantages of using a powerful computing tool like Scipy is its ability to tackle complex problems that require numerous calculations, a feature which is crucial in fields such as engineering and data science, or any discipline that relies heavily on data analysis. By the end of this lesson, you'll be introduced to various useful features in Scipy, which will serve as additional tools in your data analytics toolbox.

Installing and Importing Scipy

Scipy comes pre-installed in most CodeSignal IDEs. Let's import the stats module, which provides numerous statistical functions:

Python
1from scipy import stats

Accessing Distribution Functions in Scipy

In statistics, distribution functions play a crucial role—they enable us to identify the probability of potential outcomes of a random event. For instance, in a dice game, the distribution function can inform us of the chances of rolling a six. As we need some data to explore Scipy, let's firstly look at one way of generating meaningful data sample. We can utilise numpy.random module here:

Python
1import numpy as np
2
3# Simulating temperature data for a year in a city
4temp_data = np.random.normal(loc=30, scale=10, size=365)

In this scenario, we generate an array of 365 values, which are normally distributed with mean=30 and std=10. Note, that in numpy random, loc stands for mean, and scale stands for std.

Using Descriptive Statistics Functions in Scipy

Scipy offers more statistical functions than Numpy. We'll explore two: skewness and kurtosis. Skewness measures the asymmetry of a probability distribution around its mean, while kurtosis gauges how outlier-prone a distribution is. For instance, these metrics could help us understand unusual variations in a city's annual temperature data.

Python
1data = np.random.normal(size=1000)
2
3# Compute skewness - a measure of data asymmetry
4data_skewness = stats.skew(data)
5
6# Compute kurtosis - a measure of data "tailedness" or outliers
7data_kurtosis = stats.kurtosis(data)
8
9print(f"Skewness: {data_skewness}\nKurtosis: {data_kurtosis}")

Interpretation of Skewness

Look at the picture below. This graph showcases asymmetry in statistical distributions. A negative skewness (blue curve) indicates the left tail is longer or fatter than the right - showing more lower valued data. In contrast, a positive skewness (red curve) indicates a distribution where the right tail is longer or fatter - representing more higher valued data. Skewness helps identify the shape and direction of spread of our data.

Interpretation of Kurtosis

The next plot gives us insight into the shape of a distribution's tail and peak. Underneath the blue curve is a normal distribution with a kurtosis of 0, showcasing a relatively balanced distribution with no extreme values. The red curve, with a higher kurtosis (Laplace distribution), has a more pronounced or 'pointy' peak with heavier tails, indicating more extreme values in the dataset. Higher kurtosis can signify an exceptional event, such as a black swan event in finance.

Summary and Reflection on Learned Skills

Great job! Today, we became familiar with Scipy and its application in statistical computations. We learned how to access distribution functions in Scipy and what skewness and kurtosis mean. Continue practicing to build confidence in these skills and keep exploring the possibilities in the data world with Scipy. Exciting exercises are on the way! Happy analyzing!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.