Finding the Parameters
Logistic Regression only requires from computer to learn the best parameters Ξ². For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the p - probability of belonging to class 1.
Obviously, the model with good parameters is the one predicting high (close to 1) p for instances that are actually of class 1 and low (close to 0) p for instances with the actual class 0.
To measure how bad or how good the model is, we use a cost function. In linear regression, we used SSR as a cost function. This time, a different function is used:
Here p is the probability of belonging to class 1, predicted by the model, and y is the actual target value.
This function not only penalizes incorrect predictions but also accounts for how confident the model was in its prediction.
As you can see from the image above, if the value of p is close to y (actual target), then the cost function is relatively small. It means that the model confidently chose the correct class.
But if the prediction is incorrect, the cost function grows exponentially as the model's confidence in the wrong class increases.
We calculate the cost function for each training instance and take the average. This cost function is called Cross-Entropy Loss. So Logistic Regression just finds the parameters Ξ² that minimize Cross-Entropy Loss.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Ask me questions about this topic
Summarize this chapter
Show real-world examples
Awesome!
Completion rate improved to 3.57
Finding the Parameters
Swipe to show menu
Logistic Regression only requires from computer to learn the best parameters Ξ². For that, we need to define what "best parameters" means. Let's recall how the model works, it predicts the p - probability of belonging to class 1.
Obviously, the model with good parameters is the one predicting high (close to 1) p for instances that are actually of class 1 and low (close to 0) p for instances with the actual class 0.
To measure how bad or how good the model is, we use a cost function. In linear regression, we used SSR as a cost function. This time, a different function is used:
Here p is the probability of belonging to class 1, predicted by the model, and y is the actual target value.
This function not only penalizes incorrect predictions but also accounts for how confident the model was in its prediction.
As you can see from the image above, if the value of p is close to y (actual target), then the cost function is relatively small. It means that the model confidently chose the correct class.
But if the prediction is incorrect, the cost function grows exponentially as the model's confidence in the wrong class increases.
We calculate the cost function for each training instance and take the average. This cost function is called Cross-Entropy Loss. So Logistic Regression just finds the parameters Ξ² that minimize Cross-Entropy Loss.
Thanks for your feedback!