Welcome to another fascinating session! Today, we will delve into probability distributions and learn how Python expedites the exploration of data patterns. We will examine different types of probability distributions, specifically the Uniform and Normal distributions, and use Python libraries to visualize them.
Probability measures the likelihood that an event will occur from all possible outcomes. If we flip a coin, the probability of getting a head is 50% or 0.5. Essentially, probability distributions map out each outcome of a random variable and its corresponding probability.
We will use visualization with python's powerful module, matplotlib
, to have a glance at distributions we study. The whole visualization course is covered within the course path, but for now you may treat matplotlib as a magic black box that helps us. Remember that the focus of this lesson is exploring statistical distributions, so your focus should be on this part.
Consider a scenario in which all outcomes have an equal chance of occurring. This phenomenon is described by a Uniform Distribution. For instance, if we draw a card suit from a deck, the probabilities of drawing a heart, club, diamond, or spade are equal. Let's generate and plot a Uniform Distribution using numpy
and matplotlib
.
Python1import numpy as np 2import matplotlib.pyplot as plt 3 4# Generate random numbers uniformly distributed between -1 and 1 5uniform_data = np.random.uniform(-1, 1, 1000) 6 7# Plot a Histogram of the distribution 8plt.hist(uniform_data, bins=20, density=True) 9plt.title("Uniform Distribution") 10plt.show()
Output:
Here, np.random.uniform(-1, 1, 1000)
generates 1000 random numbers uniformly distributed between -1 and 1. plt.hist(uniform_data, bins=20, density=True)
creates a histogram of the distribution, and plt.show()
displays the plot.
Next, we will explore the Normal Distribution, a statistical function that describes a symmetrical, bell-shaped curve, prevalent in statistical analysis. A key characteristic of the Normal Distribution is that it is entirely defined by its mean (average) and standard deviation (spread). Let's simulate and plot a Normal Distribution:
Python1# Generate Normal Distribution data 2normal_data = np.random.normal(loc=0, scale=1, size=1000) 3 4# Plot a Histogram of the distribution 5plt.hist(normal_data, bins=20, density=True) 6plt.title("Normal Distribution") 7plt.show()
Output:
The function np.random.normal(loc=0, scale=1, size=1000)
generates 1000 data points following a Normal Distribution with a mean of 0 and a standard deviation of 1.
We can calculate metrics like mean (average), variance (spread), skewness (asymmetry), and kurtosis (peak of the curve) to better understand our distributions. Let's calculate these in Python – we already know how!
Python1from scipy.stats import kurtosis, skew 2 3# Calculate properties 4mean = np.mean(normal_data) # Mean 5var = np.var(normal_data) # Variance 6skew = skew(normal_data) # Skewness 7kurt = kurtosis(normal_data) # Kurtosis 8 9# Print the results 10print("Mean: ", mean) 11print("Variance: ", var) 12print("Skew : ", skew) 13print("Kurtosis: ", kurt) 14 15'''Example of the output: 16Mean: -0.005624857802311251 17Variance: 0.9437562020519524 18Skew : 0.14310121629538694 19Kurtosis: 0.16510716654198143 20'''
Well done! You have grasped the concepts of probability, Uniform and Normal distributions, and have learned to simulate, visualize, and interpret these distributions using Python. Now, let's apply theory to practice with hands-on exercises. By applying your theoretical knowledge, you can strengthen your understanding and skillset in data analytics. Let's keep moving forward!