Welcome to our lesson on "Hypothesis Testing"! Hypothesis testing is a key tool in statistics for making data-driven decisions. For example, imagine a scientist testing if a new drug is effective. Hypothesis testing helps determine if the observed effect is due to the drug or just chance. By the end of this lesson, you'll understand the basics, conduct a hypothesis test, and interpret its results using Python.
Hypothesis testing is like being a detective. You gather data and decide if there's enough evidence to support your claim.
The null hypothesis is the default position that there is no effect or difference. It's what we assume to be true until proven otherwise. For example, if you want to test if a coin is fair:
- : The coin is fair (it lands heads 50% of the time).
The alternative hypothesis is what you want to prove. It represents an effect or difference:
- : The coin is not fair (it does not land heads 50% of the time).
The significance level (), usually set at 0.05 (5%), is the threshold for rejecting the null hypothesis. If the probability of observing your data given that is true is less than , you reject .
The p-value tells us how likely it is to get results at least as extreme as ours, assuming the null hypothesis is true. A smaller p-value indicates stronger evidence against .
Let's go through the steps involved in hypothesis testing using a simple example.
Let's say you have test scores from a small class and want to test if the class average is different from 70.
- : The mean test score is 70.
- : The mean test score is not 70.
We'll use a common significance level, .
Here’s the sample data of test scores: [71, 72, 69, 74, 73]
.
We will conduct a one-sample t-test to compare the sample mean against the hypothesized population mean (70). The t-test is a statistical test used to compare the mean of a sample to a known value (one-sample t-test) or to compare the means of two samples (two-sample t-test). It helps determine if there is a significant difference between the means, taking into account the sample size and variability.
There are various statistical tests, comparing different values and attributes. In this course we will focus solely on the t-test, as it is enough to grasp the general idea and concepts.
Below is how you can perform a one-sample t-test in Python using the scipy
library:
Python1# Conducting a t-test 2from scipy.stats import ttest_1samp 3 4# Sample data 5data = [71, 72, 69, 74, 73] 6 7# Perform one-sample t-test against the null hypothesis mean = 70 8t_stat, p_value = ttest_1samp(data, 70) 9 10print("T-statistic:", t_stat) # T-statistic: 2.092457497388744 11print("P-value:", p_value) # P-value: 0.10453999977837579
Let’s interpret the output. When we run the code, we get the t-statistic
and the p-value
.
- T-statistic: Measures the size of the difference between your sample mean and the population mean relative to the variation in your sample data.
- P-value: Tells how likely it is to observe the data if is true.
If the p-value
is less than (0.05), we reject the null hypothesis. Otherwise, we fail to reject . In this case, we fail to reject the null hypothesis, as the p_value > 0.05
. This means that we don't have enough evidence to prove that the mean test score is not 70.
Congratulations! You've learned the basics of hypothesis testing, a vital method for making data-based decisions. We covered:
- Key terms: null hypothesis (), alternative hypothesis (), significance level (), and p-value.
Now, it's your turn to apply what you've learned. In the practice session, you will run your own hypothesis tests on different datasets. Experiment with various sample data and significance levels to understand their effects. Happy coding!