Learn Understanding Lag Variables

Swipe to show menu

Lag variables are essential tools in time series analysis for capturing temporal dependencies. A lag variable is a shifted version of the original time series, where each value represents the observation from a previous time step. Mathematically, if you have a time series $y_t$ , the lag-1 variable is defined as $y_{\raisebox{-2pt}{$t-1$}}$ , the lag-2 variable as $y_{\raisebox{-2pt}{$t-2$}}$ , and so on. By including lagged values as features, you provide your model with information about past observations, allowing it to learn patterns such as trends, cycles, or seasonality. This is particularly useful in forecasting tasks, where predicting future values often depends on historical data. Without lag variables, a model may ignore valuable temporal structure, leading to poor predictions.


              12345678910111213
            
import pandas as pd

# Create a simple univariate time series
data = {
    "value": [10, 12, 13, 15, 14, 16, 18, 17, 19, 20]
}
df = pd.DataFrame(data)

# Generate lag features: lag_1 and lag_2
df["lag_1"] = df["value"].shift(1)
df["lag_2"] = df["value"].shift(2)

print(df)

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 2