Summary  
Demonstrates using a generalized linear model to perform logistic regression for binary classification, fitting a model with predictors and interpreting coefficients as changes in log-odds.

General domain of usage  
Predicting binary outcomes in data science (e.g., classification tasks)

In many real-world data science problems, you need to predict whether an event will happen or not, such as whether a customer will buy a product, if an email is spam, or whether a patient has a disease. These are examples of binary outcomes, where the response variable takes on only two possible values, often coded as 0 (no) and 1 (yes). Logistic regression is a statistical modeling technique designed specifically for such situations, allowing you to model the probability of a "yes" outcome based on one or more predictor variables.

# Load data
data("mtcars")
mtcars$am_factor <- factor(mtcars$am, labels = c("Automatic", "Manual"))

# Fit logistic regression: predict transmission type based on mpg and hp
logit_model <- glm(am_factor ~ mpg + hp, data = mtcars, family = binomial)

# Display model summary
summary(logit_model)

The `glm()` function fits generalized linear models, including logistic regression. Here, you specify the formula `am_factor ~ mpg + hp`, meaning you want to predict the transmission type (`am_factor`) using miles per gallon (`mpg`) and horsepower (`hp`). The `family = binomial` argument tells R to use the logistic regression model, suitable for binary outcomes.

In the model summary, the coefficients represent the effect of each predictor on the log-odds of the outcome being "Manual" versus "Automatic." For example, the coefficient for `mpg` (1.0556) means that for each additional mile per gallon, the log-odds of having a manual transmission increases by about 1.06, holding horsepower constant. The standard error and z value help you assess the reliability of each coefficient, and the `Pr(>|z|)` column provides the p-value for statistical significance. Lower deviance values and the AIC (Akaike Information Criterion) help compare model fit.

- The coefficients in logistic regression are in terms of log-odds; to interpret them as odds ratios, exponentiate the values.
- If you see convergence warnings, check for perfect separation or highly correlated predictors.
- Always ensure your response variable is a factor with two levels; otherwise, R may not fit the model as intended.

Note

Which statement best describes the main use of logistic regression in R?

Master practical data science in R by learning data cleaning, modeling, evaluation, and machine learning workflows through hands-on code. Build fluency with R syntax, functions, and outputs for real-world data science tasks.

Learn to wrangle, clean, and prepare data in R using practical, code-driven workflows.

Engineer features and reshape data for modeling using R’s tidyverse tools.

Fit, interpret, and use regression and classification models with R code.

Evaluate models and build simple machine learning pipelines in R.

Logistic Regression for Classification