TL;DR: We developed a new time-series forecasting model called ETSformer that leverages the power of two frameworks. By combining the classical intuition of seasonal-trend decomposition and exponential smoothing with modern transformers – as well as introducing novel exponential smoothing and frequency attention mechanisms – ETSformer achieves state-of-the-art performance.
Before diving into our main discussion, let’s review a couple of important concepts that are at the core of our work. (For a review of other key terms, see our Glossary.)
Time-series forecasting is concerned with the prediction of the future based on historical information, specifically for numerical data collected in a temporal or sequentially ordered manner. The accurate forecasting of such data yields many benefits across a variety of domains. For example, in e-commerce, the accurate forecasting of sales demand allows companies to optimize supply chain decisions, and create better pricing strategies.
AIOps is another important domain where forecasting plays an essential role. IT operations generate immense amounts of time-series data, and analyzing this trove of information with the help of artificial intelligence and machine learning can greatly boost operational efficiency.
Two AIOps tasks that benefit from more accurate forecasting are anomaly alerts and capacity planning:
Exponential smoothing is a family of methods motivated by the idea that forecasts are a weighted average of past data (observations), and the weights decay (decrease) exponentially as we go further into the past (older data).
Time-series data can contain a wide variety of patterns, of which trend and seasonality are two distinctive categories or components that many real-world time series exhibit. It is often helpful to split time-series data into such constituents, and this is known as time-series decomposition.
This decomposition, along with an exponentially-weighted decay, are examples of incorporating prior knowledge of time-series structures into forecasting models, and the benefits of doing so are clear, given the popularity and forecasting prowess of these methods.
Now that we know why time-series forecasting is so important, you may be wondering: how do we actually forecast the future? In the age of big data, where we have access to copious amounts of time-series metrics (for example, minute-level measurements from a data center across the span of a year), simple statistical models no longer cut it. Instead, we look towards powerful machine learning and deep learning models, which can ingest these large amounts of data, learn the salient patterns, and make accurate long-term forecasts.
However, time-series data is usually noisy and fluctuating (non-stationary). In addition, existing approaches are too general and do not incorporate the proper prior knowledge about time-series structures (such as trend and seasonality). Existing approaches, such as general machine learning methods, may incorporate some form of prior knowledge, but it’s not time-series specialized. All of this can lead to suboptimal modeling of temporal patterns and inaccurate long-term forecasts.
To address the limitations of existing methods, we propose a new method for time-series forecasting called ETSformer. You could think of our approach as “exponential smoothing transformers” – our model is essentially a transformer, extended with extra capabilities designed to tailor it to processing time-series information. Inspired by classical exponential smoothing methods, ETSformer combines their power with that of transformers to achieve state-of-the-art performance.
Since our new approach combines elements of two powerful techniques, the name embodies this fruitful combination: “ETS” comes from the extension of exponential smoothing methods to consider state space models (Error, Trend, and Seasonal) – and can also be thought of as an abbreviation for ExponenTial Smoothing – while “former” comes from transformer.
Our new approach:
Figure 1 provides a visual overview of how ETSformer generates its forecasts:
Figure 1. An overview of how ETSformer generates forecasts: first decomposition (red down-arrow) of input data into Seasonal and Trend patterns, then extrapolation of these two metrics, and finally composition (red up-arrow) recombines them into a final forecast horizon.
Our system’s architecture is essentially a transformer, consisting of an encoder and a decoder, each of which plays a key role in the three main steps:
Figure 2: How ETSformer’s components operate. A lookback window (graph, bottom-center) is processed by the encoder-decoder transformer architecture (dark gray box) to produce a forecast (graph, upper right). The encoder comprises multiple layers; each performs seasonality, growth, and level extraction via our novel Frequency Attention, Exponential Smoothing Attention, and Level modules. The decoder comprises multiple G+S stacks; each performs extrapolation on the seasonality and growth components, via the Frequency Attention and Growth Damping modules (lower right).
The encoder performs seasonality, growth, and level extraction:
The decoder then performs extrapolation into the future:
Combining two different approaches into a new method that yields new insights or benefits is a time-honored technique in science, but what about this particular domain? Now that we’ve described the different components of ETSformer, and how they tie together, we’d like to show that our approach is not just a good idea in theory. Does combining classical exponential smoothing techniques with a transformer architecture actually show good performance via measurable results, proving the efficacy of our approach?
We’re happy to report that the answer is Yes! ETSformer proves the efficacy of its approach by achieving state-of-the-art performance over six real-world time-series datasets from a range of application domains – including traffic forecasting, weather forecasting, and financial time-series forecasting. Our method beats baselines in 22 out of 24 settings (based on the MSE – mean squared error – metric) across various real-world datasets, and across different forecasting lengths (how far ahead into the future the model forecasts). See our research paper for a more detailed explanation of our empirical results and comparisons with competing baselines.
Another positive result: ETSformer achieves interpretable decompositions of the forecasted quantities – exhibiting a clear trend and seasonal pattern, rather than noisy, inaccurate decompositions. As shown in Figure 3, given a time-series and the true underlying decompositions of seasonality and trend (we have this information on synthetic data), ETSformer can reconstruct these underlying components better than a competing method. ETSformer successfully forecasts interpretable level, trend (level + growth), and seasonal components, as observed in the trend and seasonal components closely tracking the ground truth patterns. In contrast, the competing approach, Autoformer, struggles to disambiguate between trend and seasonality.
Figure 3. Visualization of time-series forecasts on a synthetic dataset by ETSformer, compared to a baseline approach (Autoformer) and ground truth. Top: seasonality & trend (the two components combined, non-decomposed). Middle: trend component (decomposed). Bottom: seasonal component (decomposed). In all three cases, ETSformer matches ground truth better than Autoformer.
ETSformer's state-of-the-art performance provides evidence that combining ETS techniques with a transformer-based architecture can yield real-world benefits. Combining other classical methods with the power of transformers might bring equally good (perhaps even greater) benefits, and would seem like a fruitful avenue to explore in future research.
It's also important to note that ETSformer generates forecasts based on a composition of interpretable time-series components. This means we can visualize each component individually, and understand how seasonality and trend affects the forecasts. This interpretability is a key feature because, in general, we want the decisions or results of AI systems to be explainable to the greatest extent possible.
Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (see links below). Connect with us on social media and our website to get regular updates on this and other research projects.
Gerald Woo is a Ph.D. Candidate in the Industrial PhD Program at Salesforce Research. His research focuses on time-series modeling with deep learning.
Chenghao Liu is a Senior Applied Scientist at Salesforce Research Asia, working on AIOps research, including time series forecasting, anomaly detection, and causal machine learning.
Donald Rose is a Technical Writer at Salesforce AI Research. He earned his Ph.D. in Computer Science at UC Irvine, and specializes in content creation for multiple projects — such as blog posts, video scripts, newsletters, media/PR material, tutorials, and workshops. He enjoys helping researchers transform their work into publications geared to a wider audience and writing think pieces about AI.