Handling Missing Values in Temporal Features
Desliza para mostrar el menú
When you create lag or rolling window features from time series data, missing values often appear at the beginning of your data. This happens because lag features shift the original series backward, leaving the first few rows without data to fill those new columns. Similarly, rolling statistics like moving averages need a certain number of prior data points to compute a value, so the earliest rows in your dataset will have missing values for these features. These gaps are a natural result of the way temporal features are constructed.
1234567891011121314151617181920212223242526import pandas as pd # Example time series data data = { "date": pd.date_range(start="2023-01-01", periods=6, freq="D"), "value": [10, 12, 13, 15, 14, 16] } df = pd.DataFrame(data) df.set_index("date", inplace=True) # Create a lag feature and a 3-day rolling mean df["lag_1"] = df["value"].shift(1) df["rolling_3"] = df["value"].rolling(window=3).mean() # Handling missing values: Option 1 - Drop rows with missing values dropped_df = df.dropna() # Handling missing values: Option 2 - Fill missing values with a constant (e.g., 0) filled_df = df.fillna(0) print("Original DataFrame with missing values:") print(df) print("\nAfter dropping missing values:") print(dropped_df) print("\nAfter filling missing values with 0:") print(filled_df)
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 1. Capítulo 7
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Sección 1. Capítulo 7