Course Content

Probability Theory Mastering

## Probability Theory Mastering

1. Additional Statements From The Probability Theory

3. Estimation of Population Parameters

4. Testing of Statistical Hypotheses

# Momentum estimation. Maximum Likelihood Estimation

Let's consider in more detail what are general population parameters and how we can estimate them.

## Momentum estimation

We use our samples and apply a **specific function** to them to estimate the parameter we're interested in. However, we can't just pick any function; we need to find the right one that gives us the **most accurate estimate**.

In mathematical statistics, there are two common methods for this. The first is called the **method of moments**, which we've discussed before. This method relies on the fact that certain characteristics of a random variable, like its mean or variance, are **directly related to the parameters** we want to estimate.

For instance, the Gaussian distribution is completely determined by its mean and variance. So, by calculating the mean and variance from our samples, we can estimate the parameters of the distribution.

## Example

Assume that we have an exponential general population and want to estimate the lambda parameter of this distribution. We can do it as follows:

If we need to estimate more than one parameter, then, accordingly, we need to use not only the mean value but also the variance or higher-order moments. Let's consider an example of estimating the parameters of the Gaussian distribution:

## Maximum likelihood estimation

The method of moments is quite simple to interpret; however, the **properties of the estimates obtained using this method may not always satisfy us** (we will talk about the properties of the estimates in the following chapters). That's why we will consider another method - **the maximum likelihood estimation**.

The maximum likelihood method is based on maximizing the **likelihood function**. This function is constructed as the joint distribution function of the vector consisting of all our samples. Let's look at the image below:

Since the samples come from the same general population independently, we can combine their distributions into one **joint distribution** by **multiplying the distributions** of each individual sample. This gives us the maximum likelihood function, which we then aim to maximize to find the best parameters.

In simpler terms, we're trying to find the parameters that make our observed samples **most likely to occur**.

Working directly with the likelihood function can be complex, so it's often easier to use the **negative log-likelihood**. Taking the logarithm turns the product of probabilities into a sum, simplifying the calculations. Plus, maximizing the likelihood is the same as minimizing the negative log-likelihood.

## Example

Let's use the maximum likelihood to estimate the parameters of Gaussian distribution:

In the code above, we use `.fit()`

method of `norm`

class to get the maximum likelihood estimation of parameters. You can apply this method to any continuous distribution represented in `scipy`

library.

Note

In some cases, the estimate using the method of moments and the maximum likelihood estimate may coincide.

We can't use `.fit()`

method for some distributions. That is why we have to construct the likelihood function manually and provide optimization. Let's look at the example:

In the code above, we manually created the log-likelihood function using `.logpmf()`

method that calculates the logarithm of PMF at each point we need.

Everything was clear?