Handling Missing Values in Temporal Features
Scorri per mostrare il menu
When you create lag or rolling window features from time series data, missing values often appear at the beginning of your data. This happens because lag features shift the original series backward, leaving the first few rows without data to fill those new columns. Similarly, rolling statistics like moving averages need a certain number of prior data points to compute a value, so the earliest rows in your dataset will have missing values for these features. These gaps are a natural result of the way temporal features are constructed.
1234567891011121314151617181920212223242526import pandas as pd # Example time series data data = { "date": pd.date_range(start="2023-01-01", periods=6, freq="D"), "value": [10, 12, 13, 15, 14, 16] } df = pd.DataFrame(data) df.set_index("date", inplace=True) # Create a lag feature and a 3-day rolling mean df["lag_1"] = df["value"].shift(1) df["rolling_3"] = df["value"].rolling(window=3).mean() # Handling missing values: Option 1 - Drop rows with missing values dropped_df = df.dropna() # Handling missing values: Option 2 - Fill missing values with a constant (e.g., 0) filled_df = df.fillna(0) print("Original DataFrame with missing values:") print(df) print("\nAfter dropping missing values:") print(dropped_df) print("\nAfter filling missing values with 0:") print(filled_df)
Tutto è chiaro?
Grazie per i tuoi commenti!
Sezione 1. Capitolo 7
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Sezione 1. Capitolo 7