Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Logistic Regression for Classification | Statistical Modeling in R
R for Data Scientists

bookLogistic Regression for Classification

In many real-world data science problems, you need to predict whether an event will happen or not, such as whether a customer will buy a product, if an email is spam, or whether a patient has a disease. These are examples of binary outcomes, where the response variable takes on only two possible values, often coded as 0 (no) and 1 (yes). Logistic regression is a statistical modeling technique designed specifically for such situations, allowing you to model the probability of a "yes" outcome based on one or more predictor variables.

123456789
# Load data data("mtcars") mtcars$am_factor <- factor(mtcars$am, labels = c("Automatic", "Manual")) # Fit logistic regression: predict transmission type based on mpg and hp logit_model <- glm(am_factor ~ mpg + hp, data = mtcars, family = binomial) # Display model summary summary(logit_model)
copy

The glm() function fits generalized linear models, including logistic regression. Here, you specify the formula am_factor ~ mpg + hp, meaning you want to predict the transmission type (am_factor) using miles per gallon (mpg) and horsepower (hp). The family = binomial argument tells R to use the logistic regression model, suitable for binary outcomes.

In the model summary, the coefficients represent the effect of each predictor on the log-odds of the outcome being "Manual" versus "Automatic." For example, the coefficient for mpg (1.0556) means that for each additional mile per gallon, the log-odds of having a manual transmission increases by about 1.06, holding horsepower constant. The standard error and z value help you assess the reliability of each coefficient, and the Pr(>|z|) column provides the p-value for statistical significance. Lower deviance values and the AIC (Akaike Information Criterion) help compare model fit.

Note
Note
  • The coefficients in logistic regression are in terms of log-odds; to interpret them as odds ratios, exponentiate the values.
  • If you see convergence warnings, check for perfect separation or highly correlated predictors.
  • Always ensure your response variable is a factor with two levels; otherwise, R may not fit the model as intended.
question mark

Which statement best describes the main use of logistic regression in R?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 2

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

bookLogistic Regression for Classification

Deslize para mostrar o menu

In many real-world data science problems, you need to predict whether an event will happen or not, such as whether a customer will buy a product, if an email is spam, or whether a patient has a disease. These are examples of binary outcomes, where the response variable takes on only two possible values, often coded as 0 (no) and 1 (yes). Logistic regression is a statistical modeling technique designed specifically for such situations, allowing you to model the probability of a "yes" outcome based on one or more predictor variables.

123456789
# Load data data("mtcars") mtcars$am_factor <- factor(mtcars$am, labels = c("Automatic", "Manual")) # Fit logistic regression: predict transmission type based on mpg and hp logit_model <- glm(am_factor ~ mpg + hp, data = mtcars, family = binomial) # Display model summary summary(logit_model)
copy

The glm() function fits generalized linear models, including logistic regression. Here, you specify the formula am_factor ~ mpg + hp, meaning you want to predict the transmission type (am_factor) using miles per gallon (mpg) and horsepower (hp). The family = binomial argument tells R to use the logistic regression model, suitable for binary outcomes.

In the model summary, the coefficients represent the effect of each predictor on the log-odds of the outcome being "Manual" versus "Automatic." For example, the coefficient for mpg (1.0556) means that for each additional mile per gallon, the log-odds of having a manual transmission increases by about 1.06, holding horsepower constant. The standard error and z value help you assess the reliability of each coefficient, and the Pr(>|z|) column provides the p-value for statistical significance. Lower deviance values and the AIC (Akaike Information Criterion) help compare model fit.

Note
Note
  • The coefficients in logistic regression are in terms of log-odds; to interpret them as odds ratios, exponentiate the values.
  • If you see convergence warnings, check for perfect separation or highly correlated predictors.
  • Always ensure your response variable is a factor with two levels; otherwise, R may not fit the model as intended.
question mark

Which statement best describes the main use of logistic regression in R?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 2
some-alt