Forecasting Data through a trained SARIMA model.

Table of Contents

Introduction

I was tasked with forecasting a specific request for the coming quarters using Machine Learning, as such, I decided to use a SARIMA model as demand is seasonal on my case.

ARIMA (AutoRegressive Integrated Moving Average) or SARIMA (Seasonal ARIMA) are models used to analyze and forecast time series data. ARIMA tries to forecast future values by looking at a combination of past values, past errors and the trend in the data.

As such, both models help understand and forecast time series data based on past observations, trends and errors in predictions, SARIMA extends ARIMA by considering seasonal patterns.

Procedure

The first step is to import the data, I removed warnings as I will use a function to perform grid search for the best SARIMA parameters, which spams convergence warnings. After this I import Pandas and Matplotlib to plot the data and check for any seasonality or trends.

Since we can clearly see trends in the data over time that means we will need to use a SARIMA model as opposed to an ARIMA model, therefore, the next step is to find the best parameters for SARIMA using a grid search function:

Now, we can use the best parameters to fit the SARIMA model, after this, I will also plot residuals and the fitted values for visual inspection, we can then see that both of them are centered around the red dashed line at zero, this means our mean is zero, there is some seasonality on the residuals, however, these are satisfactory results for my case, therefore I'll continue with the model.

The next step will be to load the data set once again, but due to the results obtained I'll perform first-order differencing and then define the training and testing datasets for the model, once this is done I fit the SARIMA model with the parameters obtained before and forecast the next two quarters, saving the data to a CSV.

Results

Due to the nature of the data I used on this project I cannot share the information I obtained, however, I will compare the real values obtained for the first month of Q3 (July) and my projected results, then calculate the Mean Absolute Percentage Error.

The MAPE for the forecasted installs in the month of July is approximately 7.08%. This means that, on average, the forecasted values were around 7.08% off from the real values or an accuracy of 93% on this model.