Вивчайте Finding the Parameters

Свайпніть щоб показати меню

Logistic Regression only requires from computer to learn the best parameters $β$ . For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the $p$ - probability of belonging to class 1:

p = \sigma (z) = \sigma (\beta_0 + \beta_1x_1 + ...)

Where

\sigma (z) = \frac{1}{1 + e^{-z}}

Obviously, the model with good parameters is the one predicting high (close to 1) $p$ for instances that are actually of class 1 and low (close to 0) $p$ for instances with the actual class 0.

To measure how bad or how good the model is, we use a cost function. In linear regression, we used MSE (mean squared error) as a cost function. This time, a different function is used:

Here $p$ represents the probability of belonging to class 1, as predicted by the model, while $y$ denotes the actual target value.

This function not only penalizes incorrect predictions but also considers the model's confidence in its predictions. As illustrated in the image above, when the value of $p$ closely matches $y$ (the actual target), the cost function remains relatively small, indicating that the model confidently selected the correct class. Conversely, if the prediction is incorrect, the cost function increases exponentially as the model's confidence in the incorrect class grows.

In the context of binary classification with a sigmoid function, the cost function used is specifically called binary cross-entropy loss, which was shown above. It's important to note that there is also a general form known as cross-entropy loss (or categorical cross-entropy) used for multi-class classification problems.

The categorical cross-entropy loss for a single training instance is calculated as follows:

\text{Categorical Cross-Entropy Loss} = -\sum_{i=1}^{C} y_i \log(p_i)

Where

$C$ is the number of classes;
$y_i$ is the actual target value (1 if the class is the correct class, 0 otherwise);
$p_i$ is the predicted probability of the instance belonging to class $i$ .

We calculate the loss function for each training instance and take the average. This average is called the cost function. Logistic Regression finds the parameters $\beta$ that minimize the cost function.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 1. Розділ 9

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 1. Розділ 9