Handling Missing Values in Temporal Features
Sveip for å vise menyen
When you create lag or rolling window features from time series data, missing values often appear at the beginning of your data. This happens because lag features shift the original series backward, leaving the first few rows without data to fill those new columns. Similarly, rolling statistics like moving averages need a certain number of prior data points to compute a value, so the earliest rows in your dataset will have missing values for these features. These gaps are a natural result of the way temporal features are constructed.
1234567891011121314151617181920212223242526import pandas as pd # Example time series data data = { "date": pd.date_range(start="2023-01-01", periods=6, freq="D"), "value": [10, 12, 13, 15, 14, 16] } df = pd.DataFrame(data) df.set_index("date", inplace=True) # Create a lag feature and a 3-day rolling mean df["lag_1"] = df["value"].shift(1) df["rolling_3"] = df["value"].rolling(window=3).mean() # Handling missing values: Option 1 - Drop rows with missing values dropped_df = df.dropna() # Handling missing values: Option 2 - Fill missing values with a constant (e.g., 0) filled_df = df.fillna(0) print("Original DataFrame with missing values:") print(df) print("\nAfter dropping missing values:") print(dropped_df) print("\nAfter filling missing values with 0:") print(filled_df)
Alt var klart?
Takk for tilbakemeldingene dine!
Seksjon 1. Kapittel 7
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Seksjon 1. Kapittel 7