Comparing ARIMA-Based Models
When you work with time series data, choosing the right forecasting model is crucial for achieving accurate predictions. As you have learned, ARIMA models are powerful tools for modeling non-seasonal time series, while SARIMA extends ARIMA's capabilities to handle seasonality. Additionally, Auto ARIMA automates the process of parameter selection. To determine which model works best for your dataset, you need to compare them using systematic strategies.
A common approach is to use cross-validation, where you repeatedly split your time series into training and testing sets, fit models to the training data, and evaluate their performance on the test data. However, because time series data is ordered, you must use techniques like rolling-origin or expanding window validation, which respect the temporal order.
Another essential strategy is metric-based selection. Here, you fit candidate models—such as ARIMA, SARIMA, and Auto ARIMA—to your training data, generate forecasts, and then compare their accuracy using quantitative metrics. The most widely used metrics include Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Lower values of these metrics indicate better forecasting performance. Comparing these values across models helps you select the one that generalizes best to unseen data.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647import warnings import pandas as pd import numpy as np from statsmodels.tsa.arima.model import ARIMA from statsmodels.tsa.statespace.sarimax import SARIMAX from sklearn.metrics import mean_absolute_error, mean_squared_error # Suppress optimization logs and warnings warnings.filterwarnings("ignore") # Generate synthetic monthly data with seasonality np.random.seed(42) periods = 60 time = np.arange(periods) seasonal = 10 + 3 * np.sin(2 * np.pi * time / 12) trend = 0.3 * time noise = np.random.normal(scale=2, size=periods) data = seasonal + trend + noise ts = pd.Series(data, index=pd.date_range("2020-01-01", periods=periods, freq="M")) # Split into train and test sets train = ts[:48] test = ts[48:] # Fit ARIMA model (no seasonal order) arima_model = ARIMA(train, order=(2, 1, 2)).fit() arima_forecast = arima_model.forecast(steps=len(test)) # Fit SARIMA model (with seasonal order) sarima_model = SARIMAX(train, order=(2, 1, 2), seasonal_order=(1, 1, 1, 12)).fit(disp=False) sarima_forecast = sarima_model.forecast(steps=len(test)) # Evaluate forecasts arima_mae = mean_absolute_error(test, arima_forecast) arima_rmse = np.sqrt(mean_squared_error(test, arima_forecast)) sarima_mae = mean_absolute_error(test, sarima_forecast) sarima_rmse = np.sqrt(mean_squared_error(test, sarima_forecast)) # Compare visually import matplotlib.pyplot as plt plt.figure(figsize=(10, 5)) plt.plot(ts, label="Actual") plt.plot(test.index, arima_forecast, label=f"ARIMA Forecast (MAE: {arima_mae:.2f} RMSE: {arima_rmse:.2f})", color="orange") plt.plot(test.index, sarima_forecast, label=f"SARIMA Forecast (MAE: {sarima_mae:.2f} RMSE: {sarima_rmse:.2f})", color="green") plt.legend() plt.title("ARIMA vs SARIMA Forecast Comparison") plt.show()
After running the comparison, you can interpret the results by looking at the MAE and RMSE values for each model. The model with the lowest error metrics is generally preferred, especially if the difference is substantial. However, you should also consider the complexity of the model and whether it captures the underlying structure of the data, such as seasonality. If SARIMA achieves lower errors than ARIMA on data with seasonal patterns, it suggests that modeling seasonality improved forecast accuracy.
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Can you explain the difference between MAE and RMSE?
How do I decide when to use ARIMA versus SARIMA?
What is the significance of seasonality in time series forecasting?
Awesome!
Completion rate improved to 6.67
Comparing ARIMA-Based Models
Свайпніть щоб показати меню
When you work with time series data, choosing the right forecasting model is crucial for achieving accurate predictions. As you have learned, ARIMA models are powerful tools for modeling non-seasonal time series, while SARIMA extends ARIMA's capabilities to handle seasonality. Additionally, Auto ARIMA automates the process of parameter selection. To determine which model works best for your dataset, you need to compare them using systematic strategies.
A common approach is to use cross-validation, where you repeatedly split your time series into training and testing sets, fit models to the training data, and evaluate their performance on the test data. However, because time series data is ordered, you must use techniques like rolling-origin or expanding window validation, which respect the temporal order.
Another essential strategy is metric-based selection. Here, you fit candidate models—such as ARIMA, SARIMA, and Auto ARIMA—to your training data, generate forecasts, and then compare their accuracy using quantitative metrics. The most widely used metrics include Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Lower values of these metrics indicate better forecasting performance. Comparing these values across models helps you select the one that generalizes best to unseen data.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647import warnings import pandas as pd import numpy as np from statsmodels.tsa.arima.model import ARIMA from statsmodels.tsa.statespace.sarimax import SARIMAX from sklearn.metrics import mean_absolute_error, mean_squared_error # Suppress optimization logs and warnings warnings.filterwarnings("ignore") # Generate synthetic monthly data with seasonality np.random.seed(42) periods = 60 time = np.arange(periods) seasonal = 10 + 3 * np.sin(2 * np.pi * time / 12) trend = 0.3 * time noise = np.random.normal(scale=2, size=periods) data = seasonal + trend + noise ts = pd.Series(data, index=pd.date_range("2020-01-01", periods=periods, freq="M")) # Split into train and test sets train = ts[:48] test = ts[48:] # Fit ARIMA model (no seasonal order) arima_model = ARIMA(train, order=(2, 1, 2)).fit() arima_forecast = arima_model.forecast(steps=len(test)) # Fit SARIMA model (with seasonal order) sarima_model = SARIMAX(train, order=(2, 1, 2), seasonal_order=(1, 1, 1, 12)).fit(disp=False) sarima_forecast = sarima_model.forecast(steps=len(test)) # Evaluate forecasts arima_mae = mean_absolute_error(test, arima_forecast) arima_rmse = np.sqrt(mean_squared_error(test, arima_forecast)) sarima_mae = mean_absolute_error(test, sarima_forecast) sarima_rmse = np.sqrt(mean_squared_error(test, sarima_forecast)) # Compare visually import matplotlib.pyplot as plt plt.figure(figsize=(10, 5)) plt.plot(ts, label="Actual") plt.plot(test.index, arima_forecast, label=f"ARIMA Forecast (MAE: {arima_mae:.2f} RMSE: {arima_rmse:.2f})", color="orange") plt.plot(test.index, sarima_forecast, label=f"SARIMA Forecast (MAE: {sarima_mae:.2f} RMSE: {sarima_rmse:.2f})", color="green") plt.legend() plt.title("ARIMA vs SARIMA Forecast Comparison") plt.show()
After running the comparison, you can interpret the results by looking at the MAE and RMSE values for each model. The model with the lowest error metrics is generally preferred, especially if the difference is substantial. However, you should also consider the complexity of the model and whether it captures the underlying structure of the data, such as seasonality. If SARIMA achieves lower errors than ARIMA on data with seasonal patterns, it suggests that modeling seasonality improved forecast accuracy.
Дякуємо за ваш відгук!