Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Characteristics of Random Variables | Additional Statements From The Probability Theory
Probability Theory Mastering

Characteristics of Random VariablesCharacteristics of Random Variables

Characteristics of random variables are important because they provide a formal way to describe and analyse the behaviour of uncertain events and outcomes in a probabilistic framework. They allow us to quantify and measure the uncertainty, variability, and central tendency of random variables, which are essential for making informed decisions and drawing meaningful conclusions from data.

The probability distribution of a random variable

The probability distribution of a random variable specifies the probabilities associated with each possible value in its domain. It can be represented using Probability Mass Function (PMF) for discrete random variables, or the Probability Density Function (PDF) for continuous random variables. We considered PMF and PDF in the previous chapter.

Let's look at the PDF of some continuous distributions:

Expected value

The expected value, also known as the mean or average, of a random variable is a measure of the central tendency of the random variable. It represents the weighted average of all possible values of the random variable, weighted by their respective probabilities.

Assume that X is discrete random variable and it's PMF looks like:

Valuesx_1x_2x_3....x_N
Probabilityp_1p_2p_3....p_N

We can calculate expectation as follows:

Now let's calculate expectation for continuous random variable X. Assuming, that f(x) if a PDF of this variable we can calculate expectation as follows:

Let's look at the PDF plot of normal distribution with different means:

Variance

The variance of a random variable is a measure of the dispersion or spread of the values of the random variable around its expected value. It quantifies the variability or uncertainty associated with the random variable. To calculate variance, we can use the following formula:

Let's look at the PDF plot of the normal distribution with different variances and fixed mean:

The square root of the variance is called standard deviation. Using standard deviation instead of variance can be advantageous due to two factors:

  1. We will work with smaller absolute values (while the variance will be, for example, 225, the standard deviation will be only 15, which is much more convenient in terms of calculations);
  2. The standard deviation is measured in the same units as the data, which can be important in certain cases (if, for example, we work with length in meters, then the variance will be measured in square meters, while the standard deviation is still in meters).

    Note

    The scale keyword of the scipy.stats.norm class represents the standard deviation of the normal distribution.
    The loc keyword of the scipy.stats.norm class represents the mean of the normal distribution.

Median

The median measures the central tendency in statistics that represents the middle value in a dataset when arranged in ascending or descending order.
We can calculate the median of random variable X as follows:

  1. Determine the CDF of X;
  2. Find the value y such that CDF(y) = 0.5;
  3. This value y is the median of the random variable X.

It's important to understand that expected value and median are two different characteristics: expected value is the weighted average of all possible values of the random variable, whereas the weights are the probabilities of those values occurring; on the other hand, the median is the value that separates the data into two halves.
For random variables with skewed distributions, this difference is the most significant.
Let's look at the example below:

We see that the expected value is shifted in the direction of the tail of the distribution. We have to admit that the expected value is more affected by outliers and anomalies, which makes this characteristic less reliable for real data analytics.

What characteristic should we use if we want to know the spread of a random variable around its mean value and at the same time we want to measure this spread in the same units in which the random variable is represented?

Selecciona la respuesta correcta

¿Todo estuvo claro?

Sección 1. Capítulo 4
course content

Contenido del Curso

Probability Theory Mastering

Characteristics of Random VariablesCharacteristics of Random Variables

Characteristics of random variables are important because they provide a formal way to describe and analyse the behaviour of uncertain events and outcomes in a probabilistic framework. They allow us to quantify and measure the uncertainty, variability, and central tendency of random variables, which are essential for making informed decisions and drawing meaningful conclusions from data.

The probability distribution of a random variable

The probability distribution of a random variable specifies the probabilities associated with each possible value in its domain. It can be represented using Probability Mass Function (PMF) for discrete random variables, or the Probability Density Function (PDF) for continuous random variables. We considered PMF and PDF in the previous chapter.

Let's look at the PDF of some continuous distributions:

Expected value

The expected value, also known as the mean or average, of a random variable is a measure of the central tendency of the random variable. It represents the weighted average of all possible values of the random variable, weighted by their respective probabilities.

Assume that X is discrete random variable and it's PMF looks like:

Valuesx_1x_2x_3....x_N
Probabilityp_1p_2p_3....p_N

We can calculate expectation as follows:

Now let's calculate expectation for continuous random variable X. Assuming, that f(x) if a PDF of this variable we can calculate expectation as follows:

Let's look at the PDF plot of normal distribution with different means:

Variance

The variance of a random variable is a measure of the dispersion or spread of the values of the random variable around its expected value. It quantifies the variability or uncertainty associated with the random variable. To calculate variance, we can use the following formula:

Let's look at the PDF plot of the normal distribution with different variances and fixed mean:

The square root of the variance is called standard deviation. Using standard deviation instead of variance can be advantageous due to two factors:

  1. We will work with smaller absolute values (while the variance will be, for example, 225, the standard deviation will be only 15, which is much more convenient in terms of calculations);
  2. The standard deviation is measured in the same units as the data, which can be important in certain cases (if, for example, we work with length in meters, then the variance will be measured in square meters, while the standard deviation is still in meters).

    Note

    The scale keyword of the scipy.stats.norm class represents the standard deviation of the normal distribution.
    The loc keyword of the scipy.stats.norm class represents the mean of the normal distribution.

Median

The median measures the central tendency in statistics that represents the middle value in a dataset when arranged in ascending or descending order.
We can calculate the median of random variable X as follows:

  1. Determine the CDF of X;
  2. Find the value y such that CDF(y) = 0.5;
  3. This value y is the median of the random variable X.

It's important to understand that expected value and median are two different characteristics: expected value is the weighted average of all possible values of the random variable, whereas the weights are the probabilities of those values occurring; on the other hand, the median is the value that separates the data into two halves.
For random variables with skewed distributions, this difference is the most significant.
Let's look at the example below:

We see that the expected value is shifted in the direction of the tail of the distribution. We have to admit that the expected value is more affected by outliers and anomalies, which makes this characteristic less reliable for real data analytics.

What characteristic should we use if we want to know the spread of a random variable around its mean value and at the same time we want to measure this spread in the same units in which the random variable is represented?

Selecciona la respuesta correcta

¿Todo estuvo claro?

Sección 1. Capítulo 4
some-alt