Lesson 2

Welcome to lesson two on **Hypothesis Testing with Python**! Today's focus is on the *Mann-Whitney U test*. We've previously discussed T-tests, but now we're shifting attention to the Mann-Whitney U test, used to compare two independent samples. This tool helps when data doesn't meet the normality assumption for T-tests. In this lesson, we explore the ins and outs of the Mann-Whitney U test, applying it to a real-life dataset using `Scipy`

.

Let's start with non-parametric tests. These are also called distribution-free tests; they work with data that doesn't fit a normal distribution. We use them when our data is skewed, ordinal, or has outliers. Here, ordinal means a type of data where the order of the data points matters, but not the difference between the data points. For example, the sequence in which participants finish matters in a race, but the exact time difference between each runner does not.

The Mann-Whitney U test compares two independent groups when the dependent variable is ordinal or continuous but doesn't follow a normal distribution. It ranks the values from the two groups and adds the ranks. Similar sums of ranks indicate the two groups are not significantly different.

The Mann-Whitney U test provides two values: the `U-statistic`

and the `p-value`

. The `U-statistic`

signifies the rank sum difference between two groups regarding their observed data values. In everyday language, the larger the `U-statistic`

, the more separation or difference between the two groups' data. The `p_value`

here means the same as in the T-test: If the `p-value`

is less than 0.05, we conclude that the difference is statistically significant and not due to randomness.

The `mannwhitneyu()`

function from `Scipy`

runs the U test, taking two data samples as input and returning a test statistic (U) and a p-value (p). Here's what the code looks like:

Python`1import numpy as np 2from scipy.stats import mannwhitneyu 3 4# Define two distinct data samples 5data1 = np.array([5, 22, 15, 18, 12, 17, 14]) 6data2 = np.array([25, 22, 30, 19, 23]) 7 8# Perform the Mann-Whitney U test 9U, p = mannwhitneyu(data1, data2) 10 11# Print the test statistic and p-value 12print(f'U-value: {U}') # 1.5 13print(f'p-value: {p}') # 0.0117`

If the p-value is less than 0.05, we consider it evidence against the null hypothesis, leading to its rejection.

Let's try the Mann-Whitney U test on some actual data. Suppose we have information about time spent on a website by users from two regions. The mission is to determine if there's a significant difference between these user types.

Python`1# Import necessary Python libraries 2import numpy as np 3from scipy.stats import mannwhitneyu 4 5# Data on time spent (in minutes) on the website by users 6# Import necessary Python libraries 7import numpy as np 8from scipy.stats import mannwhitneyu 9 10# Data on time spent (in minutes) on the website by users 11time_A = np.array([31, 22, 39, 27, 35, 28, 34, 26, 23, 33]) 12time_B = np.array([26, 25, 30, 28, 29, 28, 27, 30, 27, 28]) 13 14# Perform the Mann-Whitney U test 15U, p = mannwhitneyu(time_A, time_B) 16 17# Print out the results 18print(f'U-value: {U}') # 60 19print(f'p-value: {p}') # 0.47`

The p-value is **not** below 0.05, it means there isn't a considerable difference.

Great job! You now understand the Mann-Whitney U test and how to implement it with Python's `Scipy library`

. You can now work with datasets that don't follow a normal distribution. Ready for the practice session? It will help cement what you've learned and give you hands-on experience. Remember, practice is vital for mastering these methods. Enjoy the learning journey!