Aprenda t-test Math | What Is Hypothesis Testing?

Understanding the math behind the t-test is essential for applying it confidently in real-world A/B testing scenarios. The t-test helps you compare the means of two independent samples to determine if any observed difference is statistically significant, or if it could have occurred by random chance. To do this, you must calculate the t-statistic, which measures how many standard errors the difference in sample means is away from zero under the null hypothesis.

The formula for the t-statistic in the case of two independent samples is:

t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

where:

$\bar{x}_1$ and $\bar{x}_2$ are the sample means for group 1 and group 2;
$s_{\raisebox{-1pt}{$1$}}^{\raisebox{1pt}{$2$}}$ and $s_{\raisebox{-1pt}{$2$}}^{\raisebox{1pt}{$2$}}$ are the sample variances;
$n_1$ and $n_2$ are the sample sizes.

The denominator combines the estimated variances from both groups, scaled by their respective sample sizes, to calculate the standard error of the difference in means. This formula assumes the two samples are independent and may have unequal variances.

To determine the significance of your t-statistic, you also need the degrees of freedom (df), which affect the shape of the t-distribution used to interpret your result. For two samples with possibly unequal variances, the Welch-Satterthwaite equation provides an approximation:

df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

This approach is robust even if the sample sizes or variances are not equal, which is common in practical A/B testing.


              123456789101112131415161718192021222324252627282930313233
            
import numpy as np

# Sample data for two independent groups
group1 = np.array([23, 21, 19, 24, 25, 22])
group2 = np.array([30, 28, 27, 31, 29, 32])

# Calculate sample means
mean1 = np.mean(group1)
mean2 = np.mean(group2)

# Calculate sample variances (ddof=1 for sample variance)
var1 = np.var(group1, ddof=1)
var2 = np.var(group2, ddof=1)

# Sample sizes
n1 = len(group1)
n2 = len(group2)

# Calculate t-statistic
se = np.sqrt(var1/n1 + var2/n2)
t_statistic = (mean1 - mean2) / se

# Calculate degrees of freedom using Welch-Satterthwaite equation
numerator = (var1/n1 + var2/n2) ** 2
denominator = ((var1/n1)**2) / (n1 - 1) + ((var2/n2)**2) / (n2 - 1)
df = numerator / denominator

print(f"Sample mean 1: {mean1:.2f}")
print(f"Sample mean 2: {mean2:.2f}")
print(f"Sample variance 1: {var1:.2f}")
print(f"Sample variance 2: {var2:.2f}")
print(f"t-statistic: {t_statistic:.3f}")
print(f"Degrees of freedom: {df:.2f}")

After calculating the t-statistic and degrees of freedom, interpret your results as follows:

Compare your t-statistic to critical values from the t-distribution, or calculate a p-value;
If the absolute value of your t-statistic is large (given the degrees of freedom), the observed difference in sample means is unlikely due to random chance, so you may reject the null hypothesis;
If the t-statistic is small, the data does not provide strong evidence against the null hypothesis, and you cannot conclude the group means are different.

This process is essential for hypothesis testing and supports the reliability of conclusions in A/B testing.

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 1. Capítulo 4

Pergunte à IA

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Awesome!

Completion rate improved to 3.23

Deslize para mostrar o menu