Welcome to our exciting lesson! We'll be diving into Hypothesis Testing using R. Even though it might sound intricate, think of it as deciding whether a toy is worth buying based on its reviews. Our focus will be the T-test, which helps us determine if two groups differ significantly.
R has a useful function, the t.test()
, to assist us in conducting these tests quickly and accurately. By the end of this lesson, you'll understand Hypothesis Testing, know what a T-test is and be able to conduct a T-test using R. So, let's get started!
Imagine owning a café and introducing a new blueberry muffin recipe. You believe this new recipe makes the muffins more popular, increasing their sales. To confirm this, you decide to use hypothesis testing.
Null Hypothesis (H0): The null hypothesis is our initial statement that there is no change, effect, or difference. In this case, it would be, "The new blueberry muffin recipe does not increase sales." It's like saying, "Our change didn’t do anything. Things are the same."
Alternative Hypothesis (HA): The alternative hypothesis is what you are trying to prove. It suggests that there is an effect, a change, or a difference. For the café scenario, it would be, "The new blueberry muffin recipe increases sales." This is like saying, "Yes, our change made a difference. Things are not the same."
Think of the null hypothesis as maintaining the status quo or believing nothing has changed until proven otherwise. The alternative hypothesis is your bet against the status quo, proposing that a change has occurred.
Let's better understand the T-test. It examines whether the mean values of two groups truly differ. This is similar to testing whether two pots of coffee have different temperatures because one is under an AC vent or if it's just a coincidence.
The T-test presents us with two important values: the t-statistic
and the p-value
. The t-statistic
represents the size of the difference relative to the variation in your sample data. In simpler terms, a larger t-statistic
indicates a more significant difference in the mean values of the two groups. The p-value
reflects the probability that the results happened by chance. If the p-value
is less than 0.05, we usually conclude that the difference is statistically significant, not due to chance.
R has a powerful function, t.test()
, for Hypothesis Testing.
Here is how one can perform a one-sample T-test in R. We call the function, providing a dataset and a value mu
to compare the dataset's mean.
R1ages <- c(28, 32, 25, 25, 27, 27, 27, 29, 30, 31, 33) # mean = 28.5 2H1_test <- t.test(ages, mu = 30) 3 4print(H1_test$statistic) # t-statistic: -1.788854 5print(H1_test$p.value) # p-value: 0.1039
In this case, we fail to reject the null hypothesis because the p-value
is greater than 0.05 (the conventional cutoff). It suggests that statistically, we can’t conclude that the mean age of users is different from 30.
Next, let's adjust our dataset to a normal distribution with a mean that differs from 30
:
R1set.seed(1234) 2ages <- rnorm(90, mean=33, sd=5) # mean = 33 3H1_test <- t.test(ages, mu = 30) 4 5print(H1_test$statistic) # t-statistic: 3.519 6print(H1_test$p.value) # p-value: 0.0006
In this case, the p_value
is less than 0.05
, leading to the rejection of the null hypothesis. This enables us to state that there is compelling evidence against the null hypothesis. We suggest that the average age of users differs significantly from 30
.
Suppose you wish to determine whether two teams in your office work the same hours. After collecting data, use a two-sample T-test.
We will use R’s t.test()
function for the two-sample T-test. Here’s an example:
R1team_A_hours <- c(8.5, 7.5, 8, 8, 8, 8, 8, 8.5, 9) 2team_B_hours <- c(9, 8, 9, 9, 9, 9, 9, 9, 9.5) 3H1_test <- t.test(team_A_hours, team_B_hours) 4 5print(H1_test$statistic) # t-statistic: -4 6print(H1_test$p.value) # p-value: 0.001
The p-value
is less than 0.05
, which allows us to reject the null hypothesis. We can state there's a significant difference between the mean working hours of Team A and Team B based on the statistical evidence.
Fantastic job! You've mastered Hypothesis Testing, decrypted T-tests, and successfully conducted a T-test in R. T-tests serve as a reliable tool in making decisions based on data.
Now, let's get some hands-on experience. The more you practice, the better you'll grasp it. Let's delve deeper into the world of data with R!