Abstract
Seasonal
forecasting is an essential forecasting branch with applications in finance,
economics, supply chain management, energy, and healthcare. Forecasting is the
foundation of an organization's planning, provides consistency with goals and
objectives, and helps manage risks. Until the late 20th century, techniques
such as ARIMA or exponential smoothing were the only commonly used techniques
for time series forecasting. However, current hybrid methods like Facebook's
Prophet and deep learning methods have been revealed to have the potential to
improve the performance of the forecasting models. This paper aims to analyze
and contrast the following approaches to forecasting: traditional methods, the
Prophet forecasting method, and the performing deep-learning models, along with
their advantages, disadvantages, and usage examples. The aim is to present, in
a relatively brief and concise manner, the type of current state-of-the-art
time series forecasting and, in the process, to also give pointers as to what
one should look for when working on issues related to this field.
Keywords: Time
series forecasting, ARIMA, Exponential smoothing, Prophet, Deep learning, LSTM,
CNN, Hybrid models
1. Introduction
Time
series means chronological data in that it has a time-related component, and
forecasting is attempting to predict future values. In practical terms, it has
value in every domain, from retail trade to manufacturing, energy production,
financial services, and transport1.
Forecasts are crucial in realizing demand planning, inventory control, dynamic
pricing, scheduling, failure predictions, and anomaly identification.
Traditional techniques like autoregressive integrated moving average (ARIMA)
& exponential smoothening have been used for quite some time now2. This is due to their efficiency in estimating
linear trends and seasonality, but they could be more effective in representing
nonlinear patterns and interactions between the variables.
Recently,
the focus has been on machine learning techniques, such as LSTM, TCNs, or
temporal convolutional networks. Deep learning has shown excellent results in
sequence modelling problems by capturing features and structures from the raw
time series data in an end-to-end manner3.
Also, integrated procedures that blend rigorous statistical methodologies with
machine learning methods, like Facebook's Prophet procedure, have come into the
arena.
This
paper assesses the conventional approaches, the Prophet procedure, and the deep
learning strategies in time series forecasting. Section 2 provides an overview
of traditional methods, Section 3 discusses the Prophet model, and Section 4
focuses on deep learning layers. Section 5 presents an analysis of the
cross-sectional study, and Section 6 highlights directions for future research
efforts.
2. Traditional Approaches
2.1. ARIMA
The
ARIMA model, as used to forecast univariate time series data, is a popular
parametric model that has been widely adopted in the practice of forecasting.
ARIMA models break down as Autoregressive (AR), differencing (I) to make the
data stationary and moving average (MA) where MA depicts the autocorrelations
of the data series4. The model is
represented by the formula ARIMA (p, d,q ) where p is the order of the
Autoregressive model, d is the order of differencing and q is the order of the
moving average model.
ARIMA
has been used in many research works and has successfully made short-term
forecasts in several fields. For instance, 5implemented
the use of ARIMA for future electricity demand in China and considered the
holiday shifts and temperatures. They also discovered that naive and seasonal
methods were inferior to ARIMA, with a mean absolute percentage error of 3,
making it an exceptional model for predicting solar power generation-2%.
Likewise, Sharma and Ghosh6 adopted
ARIMA in making the forecasts, considering macroeconomic variables' effect on
stock prices. The model recorded an 85% accuracy level, making it possible to
generate insights needed for investment.
However,
ARIMA has limitations. The main issue with this approach is its assumption of
proportional change between past and future values, which is different from
most time series data. ARIMA also needs the data to be different or
transformed, as the series needs to be stationary. Also, it is critical to note
that the ARIMA model still needs to be improved in capturing long-term patterns
and finer details, which are obscured by noise7.
2.2. Exponential
Smoothing
The
second classical method in the field of forecasting of the univariate time
series is Exponential Smoothing. It gives diminishing weights with the growing
timeline to the past observations, and Managerial Overconfidence gives more
importance to the recent observations8.
The most basic is simple exponential smoothing (SES), which is appropriate for
data that does not exhibit trends or seasonal patterns. The more complex
versions of trend estimation are Holt's Linear Trend Method and Holt-Winters'
Seasonal Method for logic time series9.
As
stated above, exponential smoothing has been effectively utilized in different
fields of daily use10: There are
several variations of the linear method, but in this study, Holt-Winters'
method was used to predict Wind power generation based on daily and weekly
patterns. Here, the reconstructed model's normalized root mean square error
(NRMSE) is 8 for actual records. WMAVE forecasts are 17.3136, which is 6%,
better than the purely arithmetic persistence and superior to ARIMA models. In
another study by Choi and Lim11, the
authors forecasted the demand for aviation spare parts using exponential
smoothing, which they considered intermittent demand patterns. In a way, the
model proved very useful because it offered viable and reliable forecasts about
inventory.
They
are associated with exponential smoothing's computational ease, seasonality,
interpretability, and accuracy. It is important to note that the smoothing
parameters are easily identifiable optimized methods12. Nevertheless, like any other linear model,
exponential smoothing may not detect multiple nonlinear nonlinear relationships
within the series. It also assumes a priori estimating the trend and
seasonality components underlying the time series.
2.3. Theta Method
The
Theta method used for time series forecasting was proposed by Assimakopoulos
and Nikolopoulos13 and is one of the
simplistic yet practical methods for forecasting a univariate time series. When
the number of theta lines is two or more, the original time series is broken
down into segments dissimilar to the original data. Using standard linear regression,
the theta lines are extrapolated separately, and the results are then
aggregated to provide the final estimate.
Theta
method has been widely adopted as it has been proven to possess remarkable
accuracy in the M3 forecasting competition14
compared to other sophisticated models in different fields15 apply another article where the Theta method
was used to forecast daily electricity demand in Italy, taking into account the
temperature impact and effects of the calendar. As a result, the proposed model
performed well and was quite comparable to the ARIMA and exponential smoothing
methods.
Some
of the benefits of the Theta method include the following: Model
misspecification is not a problem for the Theta method, and the method can cope
with stationary and non-stationary time series16.
It is also devoid of complex concepts to understand or implement, which makes
it ideal for adoption and benchmarking. However, the Theta method only
partially recovered the seasonality as it is more complex and built on the
premise of constant rates in the theta lines.
Figure 1: Comparison
of ARIMA, Exponential Smoothing, and Theta Method.
3. Hybrid Approach: Prophet
3.1. Overview
Prophet
is a still freely available open-source forecasting method devised by the web
giant Facebook13. This feature allows
it to work effectively and efficiently with time series data, seasonality,
irregularity, and gaps. Forecasting is achieved using a recurring time-related
SFA model derived from a Bayesian framework, enabling Prophet to optimize its parameters
automatically14. The model consists
of three main components: To further shed light on the mentioned factors,
trends, seasonality, and holidays were evaluated in this study.
A
trend in a time series context involves non-cyclic fluctuations and is
described by a linear or logistic growth equation. The autocorrelation and
partial autocorrelation plots, together with the estimated residual standard
error, can be used for model diagnosis in addition to the models of
seasonality, which use the Fourier series15.
Depending on the data frequency, the Prophet fuses seasonality with daily,
weekly, and yearly trends. The holiday component utilizes the impact of random
events such as holidays and promotions with the help of a list of predefined
dates and their days' effects16.
Prophet
has found wide acceptance in practice due to its simplicity, performance in
cases of incomplete data and outliers, and capacity to work with multiple
seasonality17. It has been
implemented in a broad range of application areas, such as demand for retail
products and energy and forecasting engagement in social media.
3.2. Recent advancements
In
fitting the Prophet procedure, more recent studies have further developed this
method and perfected it. Montero-Manso et al.18
put forward an automatic selection of the best predictors, including Prophet,
by utilizing a meta-learning forecasting approach aligned with characteristics
of time series. They showed that applying the proposed meta-learning framework
achieves higher predictive accuracy than using individual models on diverse
time series.
Livera
et al.19 generalized Prophet by
developing a Bayesian hierarchical model to accommodate multiple related time
series. Overall, and at a lower level, the Hierarchical Prophet was better at
accuracy than the independently modelling datasets.
Güngör
et al.20 proposed a complex synergy
of Prophet with feature engineering based on the machine learning algorithm and
model stacking. Prophet was used to build the forecasts, and gradient boosting
machines (GBMs) were used to integrate the results obtained from the Prophet
model with the other engineered features. Overall, the results of our AMA
experiment with the retail sales data supported the hypothesis that the hybrid
model would produce better results than when Prophet or GBM was used
independently.
3.3. Prophet-Boost
Prophet-Boost
is a relatively new development undertaken to improve Prophet. It synchronizes
Prophet with gradient boosting21. It
combines the advantages of both approaches: Prophet is used to estimate the
trend and season components, while a gradient boosting technique is employed to
assess the superior elements of the model.
The
Prophet-Boost algorithm carries out the following steps: The Prophet model is
first fitted to the time series, and residuals are calculated. The residuals
are then used as the dependent variable in a second model based on gradient
boosting that predicts the residuals on additional inputs and mirrors previous
values. The final forecast is thus the Bayesian Prophet forecast with the
predicted residuals obtained from the gradient boosting model.
It
has been found effective in many aspects, such as in the various forecasting
competitions, and actual practice applications of Prophet-Boost show good
performance. Other works using it include22,
who employed Prophet-Boost in an endeavour to predict Chinese electricity
demand with the help of weather and economy variables. The model proposed here
offered a better performance than standalone models based on Prophet and
gradient boosting approaches, with a MAPE of 2%. 5%.
Hence,
Prophet-Boost, a fusion of Prophet and boosting, sheds Prophet's capability on
the one hand and additionally strengthens its features on the other to capture
the features of Gradient Boosting that may be difficult to interpret. It offers
the possibility to incorporate extra features and prior knowledge into the
forecast process without profoundly adjusting the model architecture.
Figure 2:
Prophet Forecast with Trend, Seasonality, and Holidays.
4. Deep Learning Approaches
4.1. One of the
networks used is called the Long Short-Term Memory (LSTM) Network
A
kind of recurrent neural network, long-short-term memory (LSTM) networks, is
proposed to mitigate the vanishing gradient problem and capture long-term
dependency21. LSTMs handle the
limitation of displaying the gradient problem of RNN by incorporating memory
cells and gating mechanisms22. Memory
cells contain information for longer sequences; the gates control the inputs
into and the outputs out of the cells.
LSTMs
have shown much potential as a tool in several scenarios concerning time series
forecasting. Sagheer and Kotb23 used
LSTMs in the context of the oil industry: they set petroleum production
forecasts within the scope of exogenous variables like oil prices and the count
of rigs. A significant result was obtained using LSTM over ARIMA and
feedforward neural networks, RMSE = 0. 023. Similarly, Chimmula and Zhang24 employed LSTMs in COVID-19 propagation
prediction, considering aspects like population density and mobility. The model
offered long-term reliable projections as a guide in managing the spread of the
pandemic.
Some
recent developments in the LSTMs include attention mechanisms and the
structures based on combining LSTMs and other RNNs. Engagement forecasting has
been an active research area due to its applications in healthcare and other
industries Qin et al.25 This used
input and temporal attention to determine the sections and time steps of the
input sequence, which was more beneficial to the model. When tested on traffic
flow and electricity consumption datasets, DA-LSTM exhibited lesser error than
standard LSTM and attention-based recurrent neural networks.
4.2. Temporal
Convolutional Networks (TCNs)
Temporal
Convolutional Networks (TCNs) are complex deep learning techniques in which
convolutional actions are performed on the temporal dimension26. This model incorporates dilated causal
convolutions, which allows it to model long-range dependencies while keeping
the size of the receptive field fixed. An essential characteristic of neural
network layers is their ability to perform computations in parallel and their
stability and effectiveness in the case of variable-length sequences.
Considerable
studies have been developed to apply TCNs to the time series-forecasting
problem and have been found to be efficient. Broadcast TV has been
probabilistically predicted using a conditional TCN known as cTCN by27. The CTC integrated the exogenous variables
and produced predictive probabilities. It showed a superior grasp of time
series data accuracy and the capacity to quantify uncertainties compared to
GARCH and LSTM models. Li et al.28 to
predict solar irradiance where temporal and spatial correlations are present
also use tCNs. This superiority of the proposed TCN model can be seen in the
training and test RMSE and MAE values, where it outperformed LSTM and
ConvLSTMs.
Some
of the latest techniques relating to TCNs include architectural changes and
transfer learning. TCNs were later improved by adding residual connections,
thus creating the ResTCN, as described by29.
The results indicated that the ResTCN had superior performance and shorter
convergence time compared to the conventional TCN on traffic flow and weather
prediction. Further, domain adaptation and few-shot learning have been
investigated with TCNs that are retrained from pre-trained models for a target
domain30.
Figure 3: LSTM,
TCN, and Transformer Architectures.
5. Comparative Analysis and Future
Directions
Considering
the performance comparison of the traditional approaches, Prophet, and the DL
models, their benefits and drawbacks are defined. Linear approaches such as
ARIMA and exponential smoothing are easy to understand and explain and work
well for the short term but can be quite rudimentary in their calculation.
However, challenges exist when working with such networks, such as nonlinear
nonlinear relationships, multiple-seasoned non-stationary data, and long-term
memory dependencies.
Prophet
covers some limitations of traditional methods, like multiple seasonality,
irregular events, and the possibility of tuning the hyperactive parameters by
itself. A versatile platform can be used easily to build models and make
forecasts. Still, the Prophet method assumes specific predetermined trend and
seasonality components and might miss distinct nonlinearities in highly
volatile time series data.
The
recurrent and transformer-based models, namely LSTMs and TCNs, efficiently
identify long-range dependencies and nonlinear nonlinear type relationships.
They can learn their own hierarchical representations from raw data from
features such as time series data. LSTMs help describe sequential patterns, and
TCNs can be characterized by their computation speed and stability. Still, deep
learning models depend on the quantity of training data and are sometimes
resource-consuming from the computational point of view. It also entails that
they are less interpretable than traditional analysis methods might warrant.
5.1. Future
research directions in time series forecasting include:
6. Conclusion
Time
series forecasting is a crucial area of study, which has immeasurable impacts
on a wide range of fields. This paper compares the traditional methods, Prophet
procedure, and deep learning models in time series forecasting analysis. The
comparison of other linear time series analysis methods, such as ARIMA and
exponential smoothing for short-term analysis and data forecasting, is
possible, as these methods have been designed to detect linear tendencies and
short linear temporal sequences. Therefore, to combat the seasonality problem,
Prophet comes up with a framework that is both adjustable and easy to use to
understand irregular events. LSTMs and TCNs are profound learning models
accredited for perceiving extended-horizon temporal dependencies and nonlinear
associations.
All
the models are unique and have their advantages and disadvantages, and which
model should be used depends on the form of the time series and the forecast
purpose. Yet, there is a lack of work on combining univariate and multivariate
hybrid models; few works focus on improving model interpretability, the
application of transfer learning is limited, while probabilistic forecasting,
which addresses uncertainty quantification, is still mostly in its infancy when
it comes to univariate forecasting.
Indeed,
given that time series data are becoming increasingly significant and
multivariate, improvements in forecasting methods will remain instrumental in
the decision-making process across different industries. Thus, this research
paper aims to briefly introduce this topic, present the state-of-the-art
techniques in time series forecasting, and give some advice for model selection
and future research.
7. References