Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Testing the Hypothesis of Independence of Two Random Variables | Testing of Statistical Hypotheses
course content

Course Content

Probability Theory Mastering

Testing the Hypothesis of Independence of Two Random VariablesTesting the Hypothesis of Independence of Two Random Variables

In real-life tasks, it is often needed to analyze the dependence between different features. For example:

  1. Gender and political party affiliation: We can test whether there is a relationship between gender and political party affiliation;
  2. Education level and job satisfaction: We can test whether there is a relationship between education level and job satisfaction;
  3. Age and voting behavior: We can test whether there is a relationship between age and voting behavior;
  4. Income level and preferred mode of transportation: We can test whether there is a relationship between income level and preferred mode of transportation.

But how can we prove that the variables are independent if we are not dealing with the entire population but only with small samples of the corresponding variables? For this, we can use the chi-square independence criterion.

Hypothesis formulation

We can use this criterion to test the following hypothesis:
Main hypothesis: corresponding random variables are independent of each other.
Alternative hypothesis: there are some relationships between the considered random variables

Contingency table

To use the chi-square independence test we have to provide some data preprocessing - create a contingency table. A contingency table, also known as a cross-tabulation table, is a table used to summarize the categorical data from two or more variables. The table presents the joint distribution of the variables, including the frequency or count of each combination of categories for the variables. Let's look at the example:

The contingency matrix for continuous random variables is built a little differently. We first split our values into several discrete subsets and only then build the contingency matrix, for example:

Chi-square independence criterion in Python

Finally, let's use the chi-square independence criterion to check independence on a real dataset.

Can we create contingency table for continuous variables?

Select the correct answer

Everything was clear?

Section 4. Chapter 6
course content

Course Content

Probability Theory Mastering

Testing the Hypothesis of Independence of Two Random VariablesTesting the Hypothesis of Independence of Two Random Variables

In real-life tasks, it is often needed to analyze the dependence between different features. For example:

  1. Gender and political party affiliation: We can test whether there is a relationship between gender and political party affiliation;
  2. Education level and job satisfaction: We can test whether there is a relationship between education level and job satisfaction;
  3. Age and voting behavior: We can test whether there is a relationship between age and voting behavior;
  4. Income level and preferred mode of transportation: We can test whether there is a relationship between income level and preferred mode of transportation.

But how can we prove that the variables are independent if we are not dealing with the entire population but only with small samples of the corresponding variables? For this, we can use the chi-square independence criterion.

Hypothesis formulation

We can use this criterion to test the following hypothesis:
Main hypothesis: corresponding random variables are independent of each other.
Alternative hypothesis: there are some relationships between the considered random variables

Contingency table

To use the chi-square independence test we have to provide some data preprocessing - create a contingency table. A contingency table, also known as a cross-tabulation table, is a table used to summarize the categorical data from two or more variables. The table presents the joint distribution of the variables, including the frequency or count of each combination of categories for the variables. Let's look at the example:

The contingency matrix for continuous random variables is built a little differently. We first split our values into several discrete subsets and only then build the contingency matrix, for example:

Chi-square independence criterion in Python

Finally, let's use the chi-square independence criterion to check independence on a real dataset.

Can we create contingency table for continuous variables?

Select the correct answer

Everything was clear?

Section 4. Chapter 6
some-alt