Summary  
This chapter explains how to implement code that fits linear regression models with the lm() function using a formula interface to quantify relationships between a numeric outcome and one or more predictors. It also shows how to extract model details such as coefficients, residuals, and diagnostic summaries.  

General domain of usage  
Predictive modeling in data analysis

When working with data, you often want to predict a numeric value using other variables. For instance, you might want to estimate a person's weight based on their height, or predict house prices from features like square footage and number of bedrooms. Linear regression is a fundamental statistical modeling technique that allows you to quantify the relationship between a numeric outcome (the dependent variable) and one or more predictors (the independent variables). In R, the `lm()` function is the standard tool for fitting linear regression models.

# Create a data frame with height and weight
df <- data.frame(
  height = c(60, 62, 65, 68, 70, 72),
  weight = c(115, 120, 135, 150, 165, 180)
)

# Fit a linear regression model
lm_model <- lm(weight ~ height, data = df)

# Inspect the model
summary(lm_model)

The formula interface in `lm()` uses the syntax `outcome ~ predictor1 + predictor2`, where the variable to be predicted is on the left of the tilde (`~`), and the predictors are on the right. In the example, `weight ~ height` means you are modeling weight as a function of height. The `data` argument specifies which data frame contains these variables, making your code concise and readable.

The returned object from `lm()` is a model object containing all details about the fitted regression, including the estimated coefficients, model diagnostics, and residuals. You can extract information from this object using functions like `summary()`, `coef()`, and `residuals()`.

- **Factor variables** are automatically handled as categorical predictors, but you should check that categorical data are correctly coded as factors;
- **Missing data** in predictors or the outcome will cause those rows to be dropped from the model, which can affect your results;
- **Coefficients** represent the effect of a one-unit increase in the predictor, holding other variables constant, but be careful when interpreting coefficients for categorical predictors or when predictors are correlated.

Note

What is the main purpose of fitting a linear regression model in R?

Master practical data science in R by learning data cleaning, modeling, evaluation, and machine learning workflows through hands-on code. Build fluency with R syntax, functions, and outputs for real-world data science tasks.

Learn to wrangle, clean, and prepare data in R using practical, code-driven workflows.

Engineer features and reshape data for modeling using R’s tidyverse tools.

Fit, interpret, and use regression and classification models with R code.

Evaluate models and build simple machine learning pipelines in R.

Fitting Linear Regression Models