Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge 3: Hypothesis Testing | Statistics
Data Science Interview Challenge

book
Challenge 3: Hypothesis Testing

The fascinating realm of statistics houses the intricate process of hypothesis testing. At its core, hypothesis testing is about making inferences regarding populations based on sample data. We formulate hypotheses and test them, drawing conclusions about broader datasets by analyzing a subset.

For instance, if you're studying the impact of a new teaching method in a classroom and observe a significant improvement in students' grades, can you conclusively say that the method is effective? The answer lies in hypothesis testing.


Here's the dataset we'll be using in this chapter. Feel free to dive in and explore it before tackling the task.

import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = sns.load_dataset('tips')

# Sample of data
display(data.head())

# Total bill amounts grouped by smoking status
sns.boxplot(x='smoker', y='total_bill', data=data)
plt.title('Total Bill Amounts Grouped by Smoking Status')
plt.show()

# Number of smokers vs. non-smokers by gender
sns.countplot(x='sex', hue='smoker', data=data)
plt.title('Number of Smokers vs. Non-Smokers by Gender')
plt.show()
123456789101112131415161718
import matplotlib.pyplot as plt import seaborn as sns # Load the dataset data = sns.load_dataset('tips') # Sample of data display(data.head()) # Total bill amounts grouped by smoking status sns.boxplot(x='smoker', y='total_bill', data=data) plt.title('Total Bill Amounts Grouped by Smoking Status') plt.show() # Number of smokers vs. non-smokers by gender sns.countplot(x='sex', hue='smoker', data=data) plt.title('Number of Smokers vs. Non-Smokers by Gender') plt.show()
copy
Oppgave

Swipe to start coding

In this exercise, leveraging the Seaborn's tips dataset, you'll:

  1. Test if there's a significant difference in the total_bill amounts between smokers and non-smokers. Use Mann–Whitney U test.
  2. Examine the relationship between the sex and smoker columns, determining if these two categorical variables are independent of each other.

Note

In this task, the significance level (alpha) for the p-value is set at 0.1, rather than the conventional 0.05. The choice of alpha can vary across tasks based on the context, the level of rigor required, or specific industry practices; commonly adopted values include 0.01, 0.05, and 0.1.

Løsning

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu, chi2_contingency

# Load the dataset
data = sns.load_dataset('tips')

# 1. Test whether there is a significant difference in 'total_bill' between smokers and non-smokers using Mann–Whitney test.
smokers = data[data['smoker'] == 'Yes']['total_bill']
non_smokers = data[data['smoker'] == 'No']['total_bill']
u_val, p_val = mannwhitneyu(smokers, non_smokers)
alpha = 0.1
if p_val < alpha:
print(f"There is a significant difference in 'total_bill' between smokers and non-smokers (p={p_val:.2f}).")
else:
print(f"There is no significant difference in 'total_bill' between smokers and non-smokers (p={p_val:.2f}).")

# 2. Test whether there is a relationship between 'sex' and 'smoker' using a chi-squared test.
contingency_table = pd.crosstab(data['sex'], data['smoker'])
chi2, p_val2, _, _ = chi2_contingency(contingency_table)
alpha = 0.1
if p_val2 < alpha:
print(f"There is a significant relationship between 'sex' and 'smoker' (p={p_val2:.2f}).")
else:
print(f"There is no significant relationship between 'sex' and 'smoker' (p={p_val2:.2f}).")

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 6. Kapittel 3
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu, chi2_contingency

# Load the dataset
data = sns.load_dataset('tips')

# 1. Test whether there is a significant difference in 'total_bill' between smokers and non-smokers using Mann–Whitney test.
smokers = data[data['smoker'] == ___]['total_bill']
non_smokers = data[data['smoker'] == ___]['total_bill']
u_val, p_val = ___(smokers, non_smokers)
alpha = 0.1
if ___ alpha:
print(f"There is a significant difference in 'total_bill' between smokers and non-smokers (p={p_val:.2f}).")
else:
print(f"There is no significant difference in 'total_bill' between smokers and non-smokers (p={p_val:.2f}).")

# 2. Test whether there is a relationship between 'sex' and 'smoker' using a chi-squared test.
contingency_table = pd.___(data['sex'], data['smoker'])
chi2, p_val2, _, _ = ___(contingency_table)
alpha = 0.1
if ___ alpha:
print(f"There is a significant relationship between 'sex' and 'smoker' (p={p_val2:.2f}).")
else:
print(f"There is no significant relationship between 'sex' and 'smoker' (p={p_val2:.2f}).")

Spør AI

expand
ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

some-alt