AUTHORS: Chenghao Liu, Quang Pham, Doyen Sahoo, Donald Rose
TL;DR: Nonstationary data, which changes its statistical properties over time, can make time series forecasting difficult. Despite the recent success of deep learning techniques for time series forecasting tasks, these methods are not scalable for applications where data arrives sequentially in a stream. We developed a new method for deep time-series forecasting called FSNet (Fast and Slow Learning Network), which can learn deep forecasting models on the fly in a nonstationary environment, and can successfully handle concept drift issues arising from the dynamics of such an environment. Empirical studies on real and synthetic datasets validate FSNet’s efficacy and robustness.
Before diving into our main discussion, let’s review the important concepts that are at the core of our work. In this post, our main focus is on the problem of online and deep time-series forecasting. (For a more detailed exposition on time-series forecasting and its use cases, please check out our previous post on ETSformer.)
Time series is a sequence of observations over a certain period of time. Time series forecasting, predicting future values given historical records, plays a key role in various real-life problems, such as weather forecasts, energy consumption, system tracking and monitoring, and more.
While traditional methods enhanced by domain expertise provide a means to learn temporal patterns in a data-driven way, deep learning (DL) is also being applied in this area. Recently, with increasing data availability and computational resources, we have witnessed notable achievements in leveraging DL techniques for time series forecasting tasks, because DL provides some advantages. Compared to traditional forecasting methods, DL models alleviate the need for manual feature engineering and model design, and can learn hierarchical representations and more complex dependencies.
In many real-world applications, live time series data grows and evolves rapidly. This requires the forecasting model to update itself in a timely manner to avoid the concept drift issue.
However, deep learning models follow the traditional batch learning paradigm, which requires re-training the entire dataset when dealing with new training samples. This is a major issue, since such an inefficient approach is not scalable and practical for learning from continuous data streams.
Figure 1. An overview of the online learning framework. Instead of re-training from scratch at every time step where we receive new data points, online learning frameworks are designed to continuously update the model in an incremental way.
Unlike traditional offline learning paradigms, online learning is designed to learn models incrementally from data that arrives sequentially. Models can be updated instantly and efficiently via the online learner when new training data arrives, overcoming the drawbacks of traditional batch learning.
For example, in our cloud monitoring system, the forecasting model predicts CPU and memory usage for the next 24 hours. Such predictions can help decision makers dynamically allocate cloud resources in advance, to ensure high availability to customers while reducing the operational cost. If we observe new customer behaviors, the deployed forecasting model is inevitably required to adapt to this changing environment. Fortunately, with the help of online learning, the model can automatically and efficiently adapt to this new change – without the high cost (in both time and space) of offline re-training.
Now that we know the benefit of online learning for time series forecasting, you may wonder: can we make small changes to the optimizer of deep forecasting models to support online updates?
The answer is not so simple. We think training deep forecasters online remains challenging for two major reasons:
The upshot: online time-series forecasting with deep models presents a promising yet challenging problem. Can these challenges be overcome? Read on to find out (hint: yes!).
To address the above limitations, we developed FSNet (Fast and Slow Learning Network) - a new approach designed to forecast time series on the fly and handle nonstationary time series data.
Here are some of the main features and contributions of our FSNet framework:
One of the key insights/innovations in our approach is to reformulate online time series forecasting as an online, task-free, continual learning problem. Continual learning aims to balance the following two objectives:
We found that these two objectives closely match the aforementioned challenges of online forecasting with deep models, so we developed an efficient online time series forecasting framework inspired by the Complementary Learning Systems (CLS) theory, a neuroscience framework for continual learning. The CLS theory suggests that humans can continually learn thanks to the interactions between the hippocampus and the neocortex. Moreover, the hippocampus interacts with the neocortex to consolidate, recall, and update such experiences to form a more general representation, which supports generalization to new experiences.
Motivated by this fast-and-slow learning of the CLS theory in humans, FSNet applies it to Machine Learning – enhancing deep neural networks with a complementary component to support fast learning and adaptation for online time-series forecasting.
Our new framework employs two important elements that warrant special attention:
Consequently, the adapter can model the change of temporal patterns to facilitate learning with concept drifts, while its interactions with the associative memory allows the model to quickly remember and continue to improve the learning of recurring patterns.
Note that FSNet does not explicitly detect concept drifts, but instead always improves the learning of current samples – no matter if they are generated from the current fixed distribution, gradually changed distribution, or even abruptly changed distribution.
Figure 2 gives an overview of FSNet’s components. It addresses the fast adaptation to abrupt changes by the per-layer adapter and facilitates learning of recurring patterns via a sparse associative memory interaction.
Figure 2. An overview of FSNet. (a) A standard TCN backbone (green) with (b) dilated convolution stacks (blue). (c) A stack of convolution filters (yellow). Each convolution filter in FSNet is equipped with an adapter and associative memory to facilitate fast adaptation to both old and new patterns by monitoring the backbone's gradient EMA.
Let’s consider the online learning setting when FSNet encounters new data points.
Slow learning indicates the standard weight update for neural networks, indicated by the convolution filters in Figure 2(c) and the arrow on the right side. As discussed earlier, standard neural networks converge slowly when updated with only one sample at a time (the online streaming-data scenario).
Fast learning indicates the whole module (adapter + memory) as indicated in Figure 2(c) with the blue arrows on the left side, which directly generate the update rule for base model parameters.
Recent works have demonstrated a shallow-to-deep principle, where shallower networks can quickly adapt to the changes in data streams or learn more efficiently with limited data. Therefore, it is more beneficial to learn in such scenarios with a shallow network first and then gradually increase its depth.
Motivated by this, we propose to monitor and modify each layer independently to learn the current loss better. Specifically, we implement an adapter to map the layer's recent gradients to a set of smaller, more compact transformation parameters to adapt the deep neural networks.
In online training, because of the noise and nonstationarity of time series data, a gradient of a single sample can highly fluctuate and introduce noise to the adaptation coefficients. Therefore, we use the Exponential Moving Average (EMA) of the backbone's gradient to smooth out online training's noise and to capture the temporal information in time series.
In time series, old patterns may reappear, and it is imperative to leverage our past actions to improve learning outcomes. We think it is necessary to learn repeating events to further support fast adaptations.
In FSNet, we use meta information to represent how we adapted to a particular pattern in the past; storing and retrieving the appropriate meta information could facilitate learning the corresponding pattern when they reappear in the future.
In particular, we implement an associative memory to store the meta information for the adaptation of repeating events encountered during learning. Since interacting with the memory at every step is expensive and susceptible to noise, we propose to trigger this interaction only when a substantial representation change happens. When a memory interaction is triggered, the adapter queries and retrieves the most similar transformations in the past via an attention read operation.
Now that we have described each component of FSNet, let’s see how it holds up in some experiments on both synthetic data and real-world data for online time-series forecasting.
We are happy to report that FSNet achieves significant improvements over typical baselines on both synthetic and real-world datasets. It has the capability to deal with various types of concept drifts, and achieves fast as well as better convergence.
Figure 3. Evolution of the cumulative loss during training (smaller is better).
Figure 3 provides some details, showing the convergent behaviors on the considered methods. Interestingly, we observe that concept drifts are likely to happen in most datasets because of the loss curves' sharp peaks. Moreover, such drifts appear at the early stage of learning, mostly in the first 40% of data, while the remaining half of data are quite stationary.
This result shows that the traditional batch training is often too optimistic by only testing the model on the last data segment. We observed promising results of FSNet on most datasets, with significant improvements over the baselines.
In addition, we find ECL and Traffic datasets are more challenging since they include missing values and the values could vary significantly within and across dimensions. This result sheds light on the challenges of online time-series forecasting, and handling the above challenges can further improve its performance.
The act of combining online learning and deep learning approaches into a new method is a promising yet challenging problem for time series forecasting. FSNet augments a neural network backbone with two key components:
So, FSNet should have a positive impact on the field of online deep time-series forecasting:
Taking a step back to consider the “Big Picture”, it is possible that, as important as improving time series forecasting is, the FSNet research may ultimately have an even wider impact – and not just in one but in two fields of science:
Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (see links below). Connect with us on social media and our website to get regular updates on this and other research projects.
Chenghao Liu is a Senior Applied Scientist at Salesforce Research Asia, working on AIOps research, including time series forecasting, anomaly detection, and causal machine learning.
Quang Pham was an intern at Salesforce Research Asia, working on online time-series forecasting, and is currently a Ph.D. candidate at Singapore Management University’s School of Computing and Information Systems. His research interests include continual learning and deep learning.
Doyen Sahoo is a Senior Manager, AI Research at Salesforce Research Asia. Doyen leads several projects pertaining to AI for IT Operations or AIOps, working on both fundamental and applied research in the areas of Time-Series Intelligence, Causal Analysis, Log Analysis, End-to-end AIOps (Detection, Causation, Remediation), and Capacity Planning, among others.
Donald Rose is a Technical Writer at Salesforce AI Research, specializing in content creation and editing. He works on multiple projects — including blog posts, video scripts, newsletters, media/PR material, social media, and writing workshops. His passions include helping researchers transform their work into publications geared towards a wider audience, leveraging existing content in multiple media modes, and writing think pieces about AI.