Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Preventing Data Leakage in Feature Engineering | Section
Engineering Temporal Features

bookPreventing Data Leakage in Feature Engineering

Sveip for å vise menyen

Data leakage is a critical concern when engineering features for time series data. In this context, data leakage refers to the unintentional use of information from the future – relative to the point in time being predicted – when constructing features. This can happen when features are created using data that would not have been available at prediction time, leading to overly optimistic model performance during training and testing. Leakage can occur in various ways, such as mistakenly including future target values, using rolling statistics that reach into the future, or aggregating data without respecting the temporal order. Ensuring that your features only reflect information available up to the current time step is essential for building trustworthy predictive models.

To avoid data leakage, you must follow strict guidelines when constructing features for time series problems. Always ensure that features are generated using only past and present data, never incorporating any information from the future. This means, for example, when creating lag features or rolling window statistics, the window should only include the current and previous observations. Carefully review all feature engineering steps to confirm that no future data is accessed, even indirectly. It is also important to be cautious with functions or libraries that might default to using centered or forward-looking windows, as these can inadvertently introduce leakage.

question mark

Suppose you are engineering features for a time series forecasting task. You create a rolling mean feature using a window of size 5, but you accidentally center the window, so it includes 2 observations from the future relative to each time point. What type of issue does this introduce?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 8

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 8
some-alt