Summary  
This chapter shows how to implement hypothesis testing (two-sample t-tests) and calculate confidence intervals for sample means using Python’s statistical libraries.  

General domain of usage  
Financial analysis

Understanding how to make informed investment decisions often requires more than just observing historical returns; you need to determine whether observed differences are **statistically significant** or could have occurred by chance. **Hypothesis testing** provides a formal way to test assumptions about financial data, such as whether one asset truly outperforms another. **Confidence intervals** allow you to estimate a range within which a true parameter, like the mean return, is likely to fall. Both are essential for investors seeking to make **data-driven decisions** and avoid common pitfalls like over-interpreting random fluctuations in returns.

import numpy as np
from scipy import stats

# Simulated daily returns for two assets
asset_a_returns = np.array([0.001, 0.002, -0.001, 0.003, 0.002, 0.000, 0.001])
asset_b_returns = np.array([0.000, 0.001, -0.002, 0.002, 0.001, -0.001, 0.000])

# Perform an independent t-test
t_stat, p_value = stats.ttest_ind(asset_a_returns, asset_b_returns, equal_var=False)

print("t-statistic:", t_stat)
print("p-value:", p_value)

When you run a t-test using `scipy.stats.ttest_ind`, you compare the means of two independent samples—in this case, the returns of two different assets. The output includes a **t-statistic**, which measures the size of the difference relative to the variation in your sample data, and a **p-value**, which helps you judge statistical significance. If the p-value is small (commonly below 0.05), you have evidence to reject the null hypothesis that the two assets have the same mean return. Otherwise, you cannot confidently claim a difference. 

A confidence interval, on the other hand, gives you a range of plausible values for a parameter such as the mean return. For example, a **95% confidence interval** suggests that, if you repeated your sampling many times, 95% of those intervals would contain the true mean. This helps investors understand the uncertainty around their estimates and avoid overconfidence in point values.

import numpy as np
from scipy import stats

# Simulated daily returns for an asset
returns = np.array([0.001, 0.002, -0.001, 0.003, 0.002, 0.000, 0.001])

# Calculate sample mean and standard error
mean_return = np.mean(returns)
sem = stats.sem(returns)

# Calculate 95% confidence interval for the mean
confidence = 0.95
h = sem * stats.t.ppf((1 + confidence) / 2., len(returns)-1)
lower_bound = mean_return - h
upper_bound = mean_return + h

print("Mean return:", mean_return)
print("95% confidence interval:", (lower_bound, upper_bound))

What does a p-value indicate in hypothesis testing?

Why might an investor use a confidence interval?

Which scipy function is used for t-tests?

Unlock the power of Python to analyze financial data, evaluate investment opportunities, and automate key investor workflows. This course guides you through practical, real-world scenarios using Python, focusing on data analysis, visualization, and quantitative techniques relevant to investors.

Learn how to use Python to analyze and visualize financial data, focusing on the core skills investors need to interpret market trends and make informed decisions.

Dive deeper into key investment metrics and learn how to analyze and visualize portfolios using Python.

Apply advanced Python techniques to automate investment analysis and leverage statistical and machine learning tools for deeper insights.

Statistical Analysis for Investment Decisions

1. What does a p-value indicate in hypothesis testing?

2. Why might an investor use a confidence interval?

3. Which scipy function is used for t-tests?