Lesson 2

Welcome to the second lesson on **Mastering Hypothesis Testing with R**! Our focus today is on the *Mann-Whitney U test*. We've engaged with T-tests previously and have now set our sights on the Mann-Whitney U test — a valuable tool when data do not meet the T-test's normality assumption. In this lesson, we'll unpack the nuances of the Mann-Whitney U test by applying it to a realistic dataset using R's `wilcox.test()`

function.

We'll begin with non-parametric tests. They're also known as distribution-free tests because they cater to data that does not follow a normal distribution. We resort to them when our data is either skewed, ordinal, or has outliers. Ordinal data is a particular type in which the order of data points matters, though the difference between the data points does not. For example, the sequence in which runners finish a race matters, but the exact time difference between each runner does not necessarily matter.

The Mann-Whitney U test is used to compare two independent groups when the dependent variable is either ordinal or continuous but does not follow a normal distribution. By ranking the values from both groups and summing the ranks, equivalent sums of ranks suggest that the two groups do not differ significantly.

The Mann-Whitney U test yields two values: the `U-statistic`

and the `p-value`

. The `U-statistic`

reflects the rank sum difference between the two groups in relation to their observed data values. Essentially, a larger `U-statistic`

indicates a greater separation or difference between the data of the two groups. The `p-value`

conveys the same information as in the T-test: If the `p-value`

is less than 0.05, the difference is statistically significant and not due to chance.

To perform the U test, we use R's `wilcox.test()`

function. This function takes two data samples as inputs and outputs a test statistic (W) and a p-value (p). Check out this code for a better insight:

R`1# Define two distinct data samples 2data1 <- c(5, 22, 15, 18, 12, 17, 14) 3data2 <- c(25, 24, 30, 19, 23) 4 5# Perform the Mann-Whitney U test 6result <- wilcox.test(data1, data2, exact = FALSE) 7 8# Print the test statistic and p-value 9print(paste('W-value:', result$statistic)) # 1.5 10print(paste('p-value:', result$p.value)) # 0.0117`

If the `p-value`

is less than 0.05, this result suggests that we should reject the null hypothesis.

The `exact = FALSE`

parameter in the `wilcox.test()`

function instructs R not to use the exact distribution method for computing the p-value. This is particularly useful when dealing with larger samples, as calculating the exact p-value can become computationally intensive. By setting `exact = FALSE`

, the function instead approximates the p-value using normal distribution assumptions, making the computation more efficient for larger datasets.

To illustrate the Mann-Whitney U test with real data, let's assume that we have information about the time users from two regions spent interacting on a website. The goal is to determine if there is a significant difference in user behaviour between the two regions.

R`1# Data on time spent (in minutes) on the website by users 2time_A <- c(31, 22, 39, 27, 35, 28, 34, 26, 23, 33) 3time_B <- c(26, 25, 30, 28, 29, 28, 27, 30, 27, 28) 4 5# Perform the Mann-Whitney U test 6result <- wilcox.test(time_A, time_B, exact = FALSE) 7 8# Print out the results 9print(paste('W-value:', result$statistic)) # 60 10print(paste('p-value:', result$p.value)) # 0.47`

Because the `p-value`

is **not** under 0.05, this result implies that there isn't a significant difference.

Great job! You've now grasped the fundamentals of the Mann-Whitney U test and how to use R to perform it. You're equipped to work with datasets that don't follow a normal distribution. Ready for the practice session? It will help reinforce your understanding and provide hands-on experience. Remember, practice is essential for mastering new techniques. Enjoy your learning journey!