Learn Stationarity | Time Series Data Processing

One of the main steps is the process of converting a non-stationary time series into a stationary one by eliminating the trend, seasonality, and other factors that affect the change in the statistical properties of the series over time. A transformed stationary time series can be more predictable and easier to analyze than a non-stationary series. There are various methods of transforming data to stationary:

Differencing

Differencing - calculating the difference between the time series's current and previous value of the time series. But how to choose the order of differencing? If the first differences fail to revolve around a constant mean and variance, then we find the second differencing using the values of the first differencing. You can repeat this until you get a stationary series.

You can also plot the differenced series and check to see if there is a constant mean and variance to determine whether or not the series is sufficiently differenced.

Decomposition

Decomposition - breaking down the time series into its trend, seasonality, and random noise components.

Box-Cox transformation

Box-Cox transformation is a method that generalizes the natural logarithm transform and converts non-normal data to more normal distribution.

Outlier removal

Outlier removal - a method that removes outliers from the non-stationary time series, which helps improve its stationarity.

In the example below, we will consider how to implement the transformation of data into stationary data using the decomposition method:


              1234567891011121314151617181920212223
            
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller

# Read the dataset
dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/df_diamond_data.csv', index_col=0, parse_dates=True)

# Time series decomposition
result = seasonal_decompose(dataset['diamond price'], model='additive', period=365)

# Dickey-Fuller test
result = adfuller(result.resid.dropna())
print(f'ADF Statistic: {result[0]:.3f}')
print(f'p-value: {result[1]:.3f}')

# Differencing
dataset_diff = dataset['diamond price'].diff().dropna()

# Dickey-Fuller test
result = adfuller(dataset_diff)
print(f'ADF Statistic: {result[0]:.3f}')
print(f'p-value: {result[1]:.3f}')

You can look at the plots below. The first - is the original dataset, and the second - is after the differencing method has been applied.

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu

Differencing

You can also plot the differenced series and check to see if there is a constant mean and variance to determine whether or not the series is sufficiently differenced.

Decomposition

Decomposition - breaking down the time series into its trend, seasonality, and random noise components.

Box-Cox transformation

Box-Cox transformation is a method that generalizes the natural logarithm transform and converts non-normal data to more normal distribution.

Outlier removal

Outlier removal - a method that removes outliers from the non-stationary time series, which helps improve its stationarity.

In the example below, we will consider how to implement the transformation of data into stationary data using the decomposition method:


              1234567891011121314151617181920212223
            
import pandas as pd
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller

# Read the dataset
dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/df_diamond_data.csv', index_col=0, parse_dates=True)

# Time series decomposition
result = seasonal_decompose(dataset['diamond price'], model='additive', period=365)

# Dickey-Fuller test
result = adfuller(result.resid.dropna())
print(f'ADF Statistic: {result[0]:.3f}')
print(f'p-value: {result[1]:.3f}')

# Differencing
dataset_diff = dataset['diamond price'].diff().dropna()

# Dickey-Fuller test
result = adfuller(dataset_diff)
print(f'ADF Statistic: {result[0]:.3f}')
print(f'p-value: {result[1]:.3f}')

You can look at the plots below. The first - is the original dataset, and the second - is after the differencing method has been applied.

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 3

Stationarity

Differencing

Decomposition

Box-Cox transformation

Outlier removal

Awesome!

Stationarity

Differencing

Decomposition

Box-Cox transformation

Outlier removal