Mastering Non-Parametric Testing: The Mann-Whitney U Test in R

Lesson 2

Introduction

Welcome to the second lesson on Mastering Hypothesis Testing with R! Our focus today is on the Mann-Whitney U test. We've engaged with T-tests previously and have now set our sights on the Mann-Whitney U test — a valuable tool when data do not meet the T-test's normality assumption. In this lesson, we'll unpack the nuances of the Mann-Whitney U test by applying it to a realistic dataset using R's wilcox.test() function.

Mann-Whitney U Test Overview

We'll begin with non-parametric tests. They're also known as distribution-free tests because they cater to data that does not follow a normal distribution. We resort to them when our data is either skewed, ordinal, or has outliers. Ordinal data is a particular type in which the order of data points matters, though the difference between the data points does not. For example, the sequence in which runners finish a race matters, but the exact time difference between each runner does not necessarily matter.

The Mann-Whitney U test is used to compare two independent groups when the dependent variable is either ordinal or continuous but does not follow a normal distribution. By ranking the values from both groups and summing the ranks, equivalent sums of ranks suggest that the two groups do not differ significantly.

The Mann-Whitney U test yields two values: the U-statistic and the p-value. The U-statistic reflects the rank sum difference between the two groups in relation to their observed data values. Essentially, a larger U-statistic indicates a greater separation or difference between the data of the two groups. The p-value conveys the same information as in the T-test: If the p-value is less than 0.05, the difference is statistically significant and not due to chance.

Performing the Test in R

To perform the U test, we use R's wilcox.test() function. This function takes two data samples as inputs and outputs a test statistic (W) and a p-value (p). Check out this code for a better insight:

R
1# Define two distinct data samples
2data1 <- c(5, 22, 15, 18, 12, 17, 14)
3data2 <- c(25, 24, 30, 19, 23)
4
5# Perform the Mann-Whitney U test
6result <- wilcox.test(data1, data2, exact = FALSE)
7
8# Print the test statistic and p-value
9print(paste('W-value:', result$statistic))  # 1.5
10print(paste('p-value:', result$p.value))  # 0.0117

If the p-value is less than 0.05, this result suggests that we should reject the null hypothesis.

The exact = FALSE parameter in the wilcox.test() function instructs R not to use the exact distribution method for computing the p-value. This is particularly useful when dealing with larger samples, as calculating the exact p-value can become computationally intensive. By setting exact = FALSE, the function instead approximates the p-value using normal distribution assumptions, making the computation more efficient for larger datasets.

A Real-World Example

To illustrate the Mann-Whitney U test with real data, let's assume that we have information about the time users from two regions spent interacting on a website. The goal is to determine if there is a significant difference in user behaviour between the two regions.

R
1# Data on time spent (in minutes) on the website by users
2time_A <- c(31, 22, 39, 27, 35, 28, 34, 26, 23, 33)
3time_B <- c(26, 25, 30, 28, 29, 28, 27, 30, 27, 28)
4
5# Perform the Mann-Whitney U test
6result <- wilcox.test(time_A, time_B, exact = FALSE)
7
8# Print out the results
9print(paste('W-value:', result$statistic))  # 60
10print(paste('p-value:', result$p.value))  # 0.47

Because the p-value is not under 0.05, this result implies that there isn't a significant difference.

Conclusion

Great job! You've now grasped the fundamentals of the Mann-Whitney U test and how to use R to perform it. You're equipped to work with datasets that don't follow a normal distribution. Ready for the practice session? It will help reinforce your understanding and provide hands-on experience. Remember, practice is essential for mastering new techniques. Enjoy your learning journey!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.