Time series data is ubitious, and time series statistical models should be included in any data scientists’ toolkit. This tutorial covers the mathematical formulation, statistical foundation, and practical considerations of one of the most important classes of time series models: the AutoRegression Integrated Moving Average with Explanatory Variables model and its seasonal counterpart.
Time series data is ubitious, both within and out of the field of data science: weekly initial unemployment claim, tick level stock prices, weekly company sales, daily number of steps taken recorded by a wearable, just to name a few. Some of the most important and commonly used data science techniques to analyze time series data are those in developed in the field of statistics. For this reason, time series statistical models should be included in any data scientists’ toolkit.
This 120-minute tutorial covers the mathematical formulation, statistical foundation, and practical considerations of one of the most important classes of time series models, AutoRegression Integrated Moving Average with Explanatory Variables (ARIMAX) models, and its Seasonal counterpart (SARIMAX). Specific topics include
• Common use cases of SARIMAX • The entire class of SARIMAX models, which include Autoregressive (AR) models, Moving Average (MA) models, Mixed Autoregressive Moving Average (ARMA) models, Autoregressive Integrated Moving Average (ARIMA) models, these models with explanatory variables (e.g. ARIMAX), and these models with seasonal components and explanatory variables (SARIMAX) • Mathematical formulation • Underlying assumptions of this class of model • Implementation of these models in Python and R, in which I will compare and contrast the two, using simulated and real-world time-series data, which includes o Exploratory time series data analysis using histogram, kernel density plot, time-series plot, scatterplot matrix, plots of autocorrelation (i.e. correlogram), and plots of partial autocorrelation o Statistical estimation and its options available in Python and R o Simulation of these models o Order selection (using the celebrated Box-Jenkins approach) o Assumption testing and model evaluation o Forecasting
This tutorial is suitable for data scientists who have working knowledge of the classical linear regression model, including its mathematical formulation and underlying statistical assumptions, and practical implementation of regression models using Python or R.