Lesson 4
Hypothesis Testing
Lesson Introduction

Welcome to our lesson on "Hypothesis Testing"! Hypothesis testing is a key tool in statistics for making data-driven decisions. For example, imagine a scientist testing if a new drug is effective. Hypothesis testing helps determine if the observed effect is due to the drug or just chance. By the end of this lesson, you'll understand the basics, conduct a hypothesis test, and interpret its results using Python.

What is Hypothesis Testing?

Hypothesis testing is like being a detective. You gather data and decide if there's enough evidence to support your claim.

Null and Alternative Hypotheses

The null hypothesis is the default position that there is no effect or difference. It's what we assume to be true until proven otherwise. For example, if you want to test if a coin is fair:

  • H0H_0: The coin is fair (it lands heads 50% of the time).

The alternative hypothesis is what you want to prove. It represents an effect or difference:

  • HAH_A: The coin is not fair (it does not land heads 50% of the time).
Significance Level and P-value

The significance level (α\alpha), usually set at 0.05 (5%), is the threshold for rejecting the null hypothesis. If the probability of observing your data given that H0H_0 is true is less than α\alpha, you reject H0H_0.

The p-value tells us how likely it is to get results at least as extreme as ours, assuming the null hypothesis is true. A smaller p-value indicates stronger evidence against H0H_0.

Step-by-Step Explanation of Hypothesis Testing: Part 1

Let's go through the steps involved in hypothesis testing using a simple example.

Let's say you have test scores from a small class and want to test if the class average is different from 70.

  • H0H_0: The mean test score is 70.
  • HAH_A: The mean test score is not 70.

We'll use a common significance level, α=0.05\alpha = 0.05.

Step-by-Step Explanation of Hypothesis Testing: Part 2

Here’s the sample data of test scores: [71, 72, 69, 74, 73].

We will conduct a one-sample t-test to compare the sample mean against the hypothesized population mean (70). The t-test is a statistical test used to compare the mean of a sample to a known value (one-sample t-test) or to compare the means of two samples (two-sample t-test). It helps determine if there is a significant difference between the means, taking into account the sample size and variability.

There are various statistical tests, comparing different values and attributes. In this course we will focus solely on the t-test, as it is enough to grasp the general idea and concepts.

Below is how you can perform a one-sample t-test in Python using the scipy library:

Python
1# Conducting a t-test 2from scipy.stats import ttest_1samp 3 4# Sample data 5data = [71, 72, 69, 74, 73] 6 7# Perform one-sample t-test against the null hypothesis mean = 70 8t_stat, p_value = ttest_1samp(data, 70) 9 10print("T-statistic:", t_stat) # T-statistic: 2.092457497388744 11print("P-value:", p_value) # P-value: 0.10453999977837579

Let’s interpret the output. When we run the code, we get the t-statistic and the p-value.

  • T-statistic: Measures the size of the difference between your sample mean and the population mean relative to the variation in your sample data.
  • P-value: Tells how likely it is to observe the data if H0H_0 is true.

If the p-value is less than α\alpha (0.05), we reject the null hypothesis. Otherwise, we fail to reject H0H_0. In this case, we fail to reject the null hypothesis, as the p_value > 0.05. This means that we don't have enough evidence to prove that the mean test score is not 70.

Lesson Summary

Congratulations! You've learned the basics of hypothesis testing, a vital method for making data-based decisions. We covered:

  • Key terms: null hypothesis (H0H_0), alternative hypothesis (HAH_A), significance level (α\alpha), and p-value.

Now, it's your turn to apply what you've learned. In the practice session, you will run your own hypothesis tests on different datasets. Experiment with various sample data and significance levels to understand their effects. Happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.