Course Content

Learning Statistics with Python

1. Basic Concepts

Sample vs Population Types of Statistics Types of Data Mean Value Median Value Median Value of the Even Number of Values Mean or Median Mode Value Descriptive Statistics Quiz

2. Mean, Median and Mode with Python

Examine the Dataset Calculating Mean and Median Values with Python Statistics with pandas Calculate the Mean and Median Salary

3. Variance and Standard Deviation

Population Variance Sample Variance Calculate Variance with Python Standard Deviation Standard Deviation with Python Calculating Variance and Standard Deviation

4. Covariance vs Correlation

Covariance Correlation Covariance and Correlation Quiz Calculate Covariance and Correlation

5. Confidence Interval

Explore the Data Set Confidence Interval Calculating Confidence Interval with Python Confidence Interval Width Quiz Calculate 95% Confidence Interval Advanced Confidence Interval Calculation with Python Match the Functions

6. Statistical Testing

What is t-test Hypotheses t-test Mathematically One-Tailed And Two-Tailed Test t-test Assumptions Performing a t-test in Python Conduct a t-test Paired t-test

Performing a t-test in Python

To conduct a t-test in Python, all you have to do is specify the alternative hypothesis and indicate whether variances are roughly equal (homogeneous).

The ttest_ind() function within scipy.stats handles the rest. Below is the syntax:

st.ttest_ind(a, b, equal_var=True, alternative='two-sided')

Parameters:

a — the first sample;
b — the second sample;
equal_var — set to True if variances are approximately equal, and False if they are not;
alternative — the type of alternative hypothesis:
- 'two-sided' — indicates that the means are not equal;
- 'less' — implies that the first mean is less than the second;
- 'greater' — implies that the first mean is greater than the second.

Return values:

statistic — the value of the t statistic;
pvalue — the p-value.

The focus is on the p-value. If the p-value is lower than α (usually 0.05), the t statistic falls within the critical region, leading to the acceptance of the alternative hypothesis. If the p-value is greater than α, the null hypothesis is accepted, indicating that the means are equal.

Here is an example of applying the t-test to the heights dataset:


              123456789101112131415
            
import pandas as pd
import scipy.stats as st

# Load the data
male = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/Testing2.0/male.csv').squeeze()
female = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/Testing2.0/female.csv').squeeze()

# Apply t-test
t_stat, pvalue = st.ttest_ind(male, female, equal_var=True, alternative="greater")

if pvalue > 0.05:
# Check if we should support or not the null hypothesis if pvalue > 0.05:
    print("We support the null hypothesis, the mean values are equal")
else:
    print("We reject the null hypothesis, males are taller")

Everything was clear?

Thanks for your feedback!

Section 6. Chapter 6

Ask AI

Ask anything or try one of the suggested questions to begin our chat