Course Content

Probability Theory Mastering

## Probability Theory Mastering

1. Additional Statements From The Probability Theory

3. Estimation of Population Parameters

4. Testing of Statistical Hypotheses

# What is P-value?

**The P-value** is a probability value used in statistical hypothesis testing. It is the probability of obtaining a test statistic at least as extreme as the one calculated from the sample data, assuming the null hypothesis is true. Thus, thanks to the p-value, we can determine whether the value of our criterion fell into the critical region

## Hypothesis testing guideline

**Step 1**. We have samples and formulations of the main and alternative hypotheses. Firstly we define the significance level (probability of type 1 mistake) which will satisfy us;

**Step 2**. We choose the **criterion** by which we will test the hypothesis. Knowing the distribution of our initial data, we determine how the values of this criterion will be distributed;

**Step 3**. We consider the value of the criterion (it is also called **test statistic**) for our particular samples, after which we determine the p-value;

Note

If we cannot determine the real distribution of the criterion, then we can use the empirical. One of the methods for constructing the empirical distribution will be discussed in the penultimate chapter of this section.

**Step 4**. We **reject** the main hypothesis if the obtained p-value is **less than the significance level**. If the p-value is **greater than the significance level** - we conclude that the **main hypothesis is right**. We still reject the main hypothesis if the p-value differs very little from the given significance level.

Nevertheless, to test most of the hypotheses, the corresponding methods have already been implemented, so we do not need to complete all the steps but just get the p-value and compare it with a chosen significance level.

## Example

Let's look at an example. In Section 3 Chapter 2, we estimated the parameters of the population based on the samples, making the **assumption about the population's distribution**. Let's now check if our data is normal / exponentially distributed with the found parameters.

In the code above we:

- Imported necessary datasets and specified significance level
`alpha`

; - Used Kolmogorov-Smirnov criterion to check the hypothesis about the distribution of our samples;
- used
`kstest`

function to get criterion value and p-value; - used our data as the first argument of
`kstest`

function and the CDF of the normal/exponential distribution with specified parameters as the second argument.

- used
- Compared
`p_value`

with`alpha`

to accept/reject the main hypothesis.

Note

There are many statistical tests to test the distribution of samples. The most popular are the Shapiro-Wilk test (

`scipy.stats.shapiro`

) , Anderson-Darling test (`scipy.stats.anderson`

), Chi-squared goodness of fit test (`scipy.stats.chisquare`

)

Everything was clear?