Вивчайте Comparing ARIMA-Based Models | Advanced ARIMA Techniques and Model Selection

When you work with time series data, choosing the right forecasting model is crucial for achieving accurate predictions. As you have learned, ARIMA models are powerful tools for modeling non-seasonal time series, while SARIMA extends ARIMA's capabilities to handle seasonality. Additionally, Auto ARIMA automates the process of parameter selection. To determine which model works best for your dataset, you need to compare them using systematic strategies.

A common approach is to use cross-validation, where you repeatedly split your time series into training and testing sets, fit models to the training data, and evaluate their performance on the test data. However, because time series data is ordered, you must use techniques like rolling-origin or expanding window validation, which respect the temporal order.

Another essential strategy is metric-based selection. Here, you fit candidate models—such as ARIMA, SARIMA, and Auto ARIMA—to your training data, generate forecasts, and then compare their accuracy using quantitative metrics. The most widely used metrics include Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Lower values of these metrics indicate better forecasting performance. Comparing these values across models helps you select the one that generalizes best to unseen data.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
            
import warnings
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Suppress optimization logs and warnings
warnings.filterwarnings("ignore")

# Generate synthetic monthly data with seasonality
np.random.seed(42)
periods = 60
time = np.arange(periods)
seasonal = 10 + 3 * np.sin(2 * np.pi * time / 12)
trend = 0.3 * time
noise = np.random.normal(scale=2, size=periods)
data = seasonal + trend + noise
ts = pd.Series(data, index=pd.date_range("2020-01-01", periods=periods, freq="M"))

# Split into train and test sets
train = ts[:48]
test = ts[48:]

# Fit ARIMA model (no seasonal order)
arima_model = ARIMA(train, order=(2, 1, 2)).fit()
arima_forecast = arima_model.forecast(steps=len(test))

# Fit SARIMA model (with seasonal order)
sarima_model = SARIMAX(train, order=(2, 1, 2), seasonal_order=(1, 1, 1, 12)).fit(disp=False)
sarima_forecast = sarima_model.forecast(steps=len(test))

# Evaluate forecasts
arima_mae = mean_absolute_error(test, arima_forecast)
arima_rmse = np.sqrt(mean_squared_error(test, arima_forecast))
sarima_mae = mean_absolute_error(test, sarima_forecast)
sarima_rmse = np.sqrt(mean_squared_error(test, sarima_forecast))

# Compare visually
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(ts, label="Actual")
plt.plot(test.index, arima_forecast, label=f"ARIMA Forecast (MAE: {arima_mae:.2f}  RMSE: {arima_rmse:.2f})", color="orange")
plt.plot(test.index, sarima_forecast, label=f"SARIMA Forecast (MAE: {sarima_mae:.2f}  RMSE: {sarima_rmse:.2f})", color="green")
plt.legend()
plt.title("ARIMA vs SARIMA Forecast Comparison")
plt.show()

After running the comparison, you can interpret the results by looking at the MAE and RMSE values for each model. The model with the lowest error metrics is generally preferred, especially if the difference is substantial. However, you should also consider the complexity of the model and whether it captures the underlying structure of the data, such as seasonality. If SARIMA achieves lower errors than ARIMA on data with seasonal patterns, it suggests that modeling seasonality improved forecast accuracy.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 4. Розділ 3

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain the difference between MAE and RMSE?

How do I decide when to use ARIMA versus SARIMA?

What is the significance of seasonality in time series forecasting?

Свайпніть щоб показати меню


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
            
import warnings
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Suppress optimization logs and warnings
warnings.filterwarnings("ignore")

# Generate synthetic monthly data with seasonality
np.random.seed(42)
periods = 60
time = np.arange(periods)
seasonal = 10 + 3 * np.sin(2 * np.pi * time / 12)
trend = 0.3 * time
noise = np.random.normal(scale=2, size=periods)
data = seasonal + trend + noise
ts = pd.Series(data, index=pd.date_range("2020-01-01", periods=periods, freq="M"))

# Split into train and test sets
train = ts[:48]
test = ts[48:]

# Fit ARIMA model (no seasonal order)
arima_model = ARIMA(train, order=(2, 1, 2)).fit()
arima_forecast = arima_model.forecast(steps=len(test))

# Fit SARIMA model (with seasonal order)
sarima_model = SARIMAX(train, order=(2, 1, 2), seasonal_order=(1, 1, 1, 12)).fit(disp=False)
sarima_forecast = sarima_model.forecast(steps=len(test))

# Evaluate forecasts
arima_mae = mean_absolute_error(test, arima_forecast)
arima_rmse = np.sqrt(mean_squared_error(test, arima_forecast))
sarima_mae = mean_absolute_error(test, sarima_forecast)
sarima_rmse = np.sqrt(mean_squared_error(test, sarima_forecast))

# Compare visually
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(ts, label="Actual")
plt.plot(test.index, arima_forecast, label=f"ARIMA Forecast (MAE: {arima_mae:.2f}  RMSE: {arima_rmse:.2f})", color="orange")
plt.plot(test.index, sarima_forecast, label=f"SARIMA Forecast (MAE: {sarima_mae:.2f}  RMSE: {sarima_rmse:.2f})", color="green")
plt.legend()
plt.title("ARIMA vs SARIMA Forecast Comparison")
plt.show()

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 4. Розділ 3