First, some terminology. Data is autocorrelated when there are similarities between an observation and previous observations. Seasonality is when the similarities occur at regular intervals. Trend is a long-term upward or downward movement. And data is stationary when its statistical properties, like mean and variance, do not change over time.
I'll generate data with these characteristics to use for the rest of the post:
Detecting stationarity
While time series data is usually not stationary, stationarity is important because most statistical models and tests have that assumption. The Augmented Dickey-Fuller (ADF) test can be used on normally distributed data to detect stationarity. The null hypothesis is that the data is not stationary, thus you are looking to reject it with a certain level of confidence.There are other (non-parametric) stationarity tests without the normally distributed data assumption that are beyond the scope of this post.
Transformations
By applying different transformations to our data we can make non-stationary data stationary. One approach is to subtract the rolling mean or weighted rolling mean (favoring more recent observations) from the data. A another approach is called differencing. Subtract the difference from some time period ago, like a month or a week, from the data.Forecasting
Special care must be taken when splitting time series data into a training and a test set. The order must be preserved, the data can not be reshuffled. For cross-validation, it is also important to evaluate the model only on future observations so a variation of k-fold is needed.
SARIMA
Seasonal autoregressive integrated moving average (SARIMA) is a model that can be fitted via a Kalman filter to time series data. It accounts for seasonality and trend by differencing the data, however it is a linear model so an observation needs to be a linear combination of past observations. A log or square root transform, for example, might help make the time series linear.RNN
A recurrent neural network (RNN) with long short-term memory (LSTM) is an alternative to SARIMA for modeling time series data. At the cost of complexity, it can handle non-linear data or data that isn't normally distributed.
I didn't put a lot of effort into tuning these models, or coming up with additional features, and they aren't perfect, but we can start to get a feel for how they work. The SARIMA model looks underfit. It did, however, nicely ignore the randomness in the data. The RNN model clearly overfits the data and more work would be needed to get a smoother curve.
This was my first attempt at working with SARIMAX and RNNs so any feedback is appreciated.