Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Preventing Data Leakage in Feature Engineering | Section
Engineering Temporal Features

bookPreventing Data Leakage in Feature Engineering

Desliza para mostrar el menú

Data leakage is a critical concern when engineering features for time series data. In this context, data leakage refers to the unintentional use of information from the future – relative to the point in time being predicted – when constructing features. This can happen when features are created using data that would not have been available at prediction time, leading to overly optimistic model performance during training and testing. Leakage can occur in various ways, such as mistakenly including future target values, using rolling statistics that reach into the future, or aggregating data without respecting the temporal order. Ensuring that your features only reflect information available up to the current time step is essential for building trustworthy predictive models.

To avoid data leakage, you must follow strict guidelines when constructing features for time series problems. Always ensure that features are generated using only past and present data, never incorporating any information from the future. This means, for example, when creating lag features or rolling window statistics, the window should only include the current and previous observations. Carefully review all feature engineering steps to confirm that no future data is accessed, even indirectly. It is also important to be cautious with functions or libraries that might default to using centered or forward-looking windows, as these can inadvertently introduce leakage.

question mark

Suppose you are engineering features for a time series forecasting task. You create a rolling mean feature using a window of size 5, but you accidentally center the window, so it includes 2 observations from the future relative to each time point. What type of issue does this introduce?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 8

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 8
some-alt