What is P-value? | Testing of Statistical Hypotheses
Probability Theory Mastering

Course Content

Probability Theory Mastering

# What is P-value?

The P-value is a probability value used in statistical hypothesis testing. It is the probability of obtaining a test statistic at least as extreme as the one calculated from the sample data, assuming the null hypothesis is true. Thus, thanks to the p-value, we can determine whether the value of our criterion fell into the critical region

## Hypothesis testing guideline

Step 1. We have samples and formulations of the main and alternative hypotheses. Firstly we define the significance level (probability of type 1 mistake) which will satisfy us;

Step 2. We choose the criterion by which we will test the hypothesis. Knowing the distribution of our initial data, we determine how the values ​​of this criterion will be distributed;

Step 3. We consider the value of the criterion (it is also called test statistic) for our particular samples, after which we determine the p-value;

Note

If we cannot determine the real distribution of the criterion, then we can use the empirical. One of the methods for constructing the empirical distribution will be discussed in the penultimate chapter of this section.

Step 4. We reject the main hypothesis if the obtained p-value is less than the significance level. If the p-value is greater than the significance level - we conclude that the main hypothesis is right. We still reject the main hypothesis if the p-value differs very little from the given significance level.

Nevertheless, to test most of the hypotheses, the corresponding methods have already been implemented, so we do not need to complete all the steps but just get the p-value and compare it with a chosen significance level.

## Example

Let's look at an example. In Section 3 Chapter 2, we estimated the parameters of the population based on the samples, making the assumption about the population's distribution. Let's now check if our data is normal / exponentially distributed with the found parameters.

In the code above we:

1. Imported necessary datasets and specified significance level `alpha`;
2. Used Kolmogorov-Smirnov criterion to check the hypothesis about the distribution of our samples;
• used `kstest` function to get criterion value and p-value;
• used our data as the first argument of `kstest` function and the CDF of the normal/exponential distribution with specified parameters as the second argument.
3. Compared `p_value` with `alpha` to accept/reject the main hypothesis.

Note

There are many statistical tests to test the distribution of samples. The most popular are the Shapiro-Wilk test (`scipy.stats.shapiro`) , Anderson-Darling test (`scipy.stats.anderson`), Chi-squared goodness of fit test (`scipy.stats.chisquare`)

Everything was clear?

Section 4. Chapter 2
We're sorry to hear that something went wrong. What happened?