Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Stationarity | Time Series Data Processing
Data Preprocessing
course content

Зміст курсу

Data Preprocessing

Data Preprocessing

1. Brief Introduction
2. Processing Quantitative Data
3. Processing Categorical Data
4. Time Series Data Processing
5. Feature Engineering
6. Moving on to Tasks

bookStationarity

One of the main steps is the process of converting a non-stationary time series into a stationary one by eliminating the trend, seasonality, and other factors that affect the change in the statistical properties of the series over time. A transformed stationary time series can be more predictable and easier to analyze than a non-stationary series. There are various methods of transforming data to stationary:

Differencing

Differencing - calculating the difference between the time series's current and previous value of the time series. But how to choose the order of differencing? If the first differences fail to revolve around a constant mean and variance, then we find the second differencing using the values of the first differencing. You can repeat this until you get a stationary series.

You can also plot the differenced series and check to see if there is a constant mean and variance to determine whether or not the series is sufficiently differenced.

Decomposition

Decomposition - breaking down the time series into its trend, seasonality, and random noise components.

Box-Cox transformation

Box-Cox transformation is a method that generalizes the natural logarithm transform and converts non-normal data to more normal distribution.

Outlier removal

Outlier removal - a method that removes outliers from the non-stationary time series, which helps improve its stationarity.

In the example below, we will consider how to implement the transformation of data into stationary data using the decomposition method:

1234567891011121314151617181920212223
import pandas as pd import numpy as np from statsmodels.tsa.seasonal import seasonal_decompose from statsmodels.tsa.stattools import adfuller # Read the dataset dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/df_diamond_data.csv', index_col=0, parse_dates=True) # Time series decomposition result = seasonal_decompose(dataset['diamond price'], model='additive', period=365) # Dickey-Fuller test result = adfuller(result.resid.dropna()) print(f'ADF Statistic: {result[0]:.3f}') print(f'p-value: {result[1]:.3f}') # Differencing dataset_diff = dataset['diamond price'].diff().dropna() # Dickey-Fuller test result = adfuller(dataset_diff) print(f'ADF Statistic: {result[0]:.3f}') print(f'p-value: {result[1]:.3f}')
copy

You can look at the plots below. The first - is the original dataset, and the second - is after the differencing method has been applied.

Which of the following methods can be used to convert a non-stationary time series into a stationary one?

Which of the following methods can be used to convert a non-stationary time series into a stationary one?

Виберіть правильну відповідь

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 4. Розділ 3
some-alt