Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Finding the Parameters | Section
Practice
Projects
Quizzes & Challenges
Вікторини
Challenges
/
Classification with Python

bookFinding the Parameters

Logistic Regression only requires from computer to learn the best parameters ββ. For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the pp - probability of belonging to class 1:

p=σ(z)=σ(β0+β1x1+...)p = \sigma (z) = \sigma (\beta_0 + \beta_1x_1 + ...)

Where

σ(z)=11+ez\sigma (z) = \frac{1}{1 + e^{-z}}

Obviously, the model with good parameters is the one predicting high (close to 1) pp for instances that are actually of class 1 and low (close to 0) pp for instances with the actual class 0.

To measure how bad or how good the model is, we use a cost function. In linear regression, we used MSE (mean squared error) as a cost function. This time, a different function is used:

Here pp represents the probability of belonging to class 1, as predicted by the model, while yy denotes the actual target value.

This function not only penalizes incorrect predictions but also considers the model's confidence in its predictions. As illustrated in the image above, when the value of pp closely matches yy (the actual target), the cost function remains relatively small, indicating that the model confidently selected the correct class. Conversely, if the prediction is incorrect, the cost function increases exponentially as the model's confidence in the incorrect class grows.

In the context of binary classification with a sigmoid function, the cost function used is specifically called binary cross-entropy loss, which was shown above. It's important to note that there is also a general form known as cross-entropy loss (or categorical cross-entropy) used for multi-class classification problems.

The categorical cross-entropy loss for a single training instance is calculated as follows:

Categorical Cross-Entropy Loss=i=1Cyilog(pi)\text{Categorical Cross-Entropy Loss} = -\sum_{i=1}^{C} y_i \log(p_i)

Where

  • CC is the number of classes;
  • yiy_i is the actual target value (1 if the class is the correct class, 0 otherwise);
  • pip_i is the predicted probability of the instance belonging to class ii.

We calculate the loss function for each training instance and take the average. This average is called the cost function. Logistic Regression finds the parameters β\beta that minimize the cost function.

question mark

Which of these is used as a loss in classification tasks?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 9

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

bookFinding the Parameters

Свайпніть щоб показати меню

Logistic Regression only requires from computer to learn the best parameters ββ. For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the pp - probability of belonging to class 1:

p=σ(z)=σ(β0+β1x1+...)p = \sigma (z) = \sigma (\beta_0 + \beta_1x_1 + ...)

Where

σ(z)=11+ez\sigma (z) = \frac{1}{1 + e^{-z}}

Obviously, the model with good parameters is the one predicting high (close to 1) pp for instances that are actually of class 1 and low (close to 0) pp for instances with the actual class 0.

To measure how bad or how good the model is, we use a cost function. In linear regression, we used MSE (mean squared error) as a cost function. This time, a different function is used:

Here pp represents the probability of belonging to class 1, as predicted by the model, while yy denotes the actual target value.

This function not only penalizes incorrect predictions but also considers the model's confidence in its predictions. As illustrated in the image above, when the value of pp closely matches yy (the actual target), the cost function remains relatively small, indicating that the model confidently selected the correct class. Conversely, if the prediction is incorrect, the cost function increases exponentially as the model's confidence in the incorrect class grows.

In the context of binary classification with a sigmoid function, the cost function used is specifically called binary cross-entropy loss, which was shown above. It's important to note that there is also a general form known as cross-entropy loss (or categorical cross-entropy) used for multi-class classification problems.

The categorical cross-entropy loss for a single training instance is calculated as follows:

Categorical Cross-Entropy Loss=i=1Cyilog(pi)\text{Categorical Cross-Entropy Loss} = -\sum_{i=1}^{C} y_i \log(p_i)

Where

  • CC is the number of classes;
  • yiy_i is the actual target value (1 if the class is the correct class, 0 otherwise);
  • pip_i is the predicted probability of the instance belonging to class ii.

We calculate the loss function for each training instance and take the average. This average is called the cost function. Logistic Regression finds the parameters β\beta that minimize the cost function.

question mark

Which of these is used as a loss in classification tasks?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 9
some-alt