Models extract and learn unique features from the same dataset. Some models are better at predicting highly seasonal data and some are better at predicting extreme values. We present a general framework for forecasting using a meta-model. We train Prophet; an additive regression model on the original dataset and use the predicted values to train the LSTM model along with raw data.
Details about the Talk
- Title: Building a Meta Forecasting Model with Prophet and LSTM for time series Forecasting
- Type of Talk: Informative / Idea sharing
- Audience: Data Scientists/ Machine learning engineers/ Anyone interested in Forecasting
- Duration: 10 min
- Audience Level: Intermediate
- Takeaway: The Machine Learning community is always discovering new ways to train Machine learning models and improve their accuracy. This talk introduces the application of Ensemble learning and Model stacking to solve the forecasting problem.
Statistical models have been widely used for time series forecasting. Methods such as ARIMA, SARIMA have been proven to be effective in many applications. In a practical situation, while modelling time series data one faces the important issue of how to choose the ‘best model’ among a variety of candidates. Depending on the underlying mechanism of the model and the training data, different models often learn different features and hence each model can explain different perspectives of data. Model A might extract a specific feature from the data that Model B misses out on and vice versa.
Our idea is to combine the principles of ensembling and transfer learning to forecast the time series. We call this model, a ‘Meta Forecasting Model’. The technique consists of the following :
- We fit a Prophet model on the Raw Time series. We add the custom seasonality of the model and try to make its predictions as accurate as possible.
- Now our scope is to use the values fit by Prophet model to improve the training of our neural network. Prophet has learned the multiple seasonalities present in the data, corrected the anomalous trends, learned the impact of the holidays and reconstructing a time series that is devoid of any outliers.
- All these pieces of information are stored in the values fit by the earlier model, they are a smoothed version of the original data which have been manipulated by the model during the training procedure. In other words, we can see these values as a kind of augmented data source of the original train set.
- Our strategy involves applying a two-step training procedure. We start feeding our LSTM autoencoder, using the fitted values produced by Prophet, for multi-step ahead forecasts, projecting 148 hours into the future. - Then we conclude the training with the raw data, in our case they are the same data we used before to fit the Prophet. With our neural network, we can also combine external data sources, for example, the weather conditions if we think those set of external parameters might affect the KPI
- The idea behind such this innovative approach is that our neural network can learn from two different but similar data sources and perform better on our test data.
- One caveat of this approach is that When performing multiple-step training we have to take care of the Catastrophic Forgetting problem. Catastrophic forgetting is a problem faced by many models and algorithms. When trained on one task, then trained on a second task, many machine learning models “forget” how to perform the first task. To avoid this tedious problem, the structure of the entire network has to be properly tuned to provide a benefit in performance terms. From these observations, we preserve a final part of our previous training as validation
- At its core, the network is very simple. It’s constituted by a seq2seq LSTM layer which predicts the values of the KPI N steps ahead in the future. The training procedure is carried out using keras-hypetune. This framework provides hyperparameter optimization of the neural network structures in a very intuitive way.
One caveat of using this technique is that the utility of the meta-model is strongly connected to the quality of the first model - if it captures the underlying pattern poorly, it's probably not beneficial to the second model. Hence every effort must be made to ensure the model at the beginning of the chain gives good results on its own.
In our solution, we propose using Prophet as the first model in the chain. The reason behind this conscious decision is that Prophet provides a stable forecast and the Prophet is designed to deal with country-specific public holidays, missing observations and large outliers. It is also designed to cope with time series that undergo trend changes, such as a product launch, or in the case of the telecoms, when the company upgrades the infrastructure or changes the cell configuration we can manually input the changepoints. These effects might not have been well captured by other approaches, making Prophet the ideal choice as the first model in our ensemble.
The second model is LSTM. We chose the Long Short-Term Memory (LSTM) technique due to its end-to-end modelling capabilities, ability to forecast for longer time horizons and automatic feature extraction abilities. Furthermore, the different gates inside LSTM boost its capability for capturing non-linear relationships for forecasting. While modelling time series there are some factors that have a non-linear impact on demand and therefore by using LSTM the model can learn the nonlinear relationship present in the data leading to better forecast.
By using the Meta model we are able to improve the forecasting performance. The Meta Model comprising of Prophet and LSTM improves the generalizability and robustness of the forecast and achieves the RMSE (Root Mean square Error) that is 13% lesser than vanilla LSTM model. Due to cumulative learning, the model is able to learn linear as well as non-linear dependencies in the data. Moreover, we are able to combine the merits of Prophet and LSTM and make a long term forecast that is accurate and reliable.
Due to the fact that Machine learning models are inherently data-dependent, the choice of models used in the Meta Model might vary depending on the requirement and the nature of the data they are being trained on. The idea is to build a meta or composite forecasting model that comprises multiple diverse machine learning forecasting models so that the features missed out by one model can be learned by the other model in the chain making the entire system robust.