AUTHORS: Huan Wang, Aadyot Bhatnagar, Doyen Sahoo, Wenzhuo Yang, Steven Hoi, Caiming Xiong, Donald Rose
TL;DR: Time series data is a critical source of insights for many applications, including IT Operations, Quality Management, Financial Analytics, and Inventory & Sales Management. While a variety of dedicated packages and software exist, engineers and researchers still face several daunting challenges when they try to experiment with or benchmark time-series analysis algorithms. The steep learning curve for disparate programming interfaces for different models - as well as the process of selecting and training a model, data compatibility requirements, and intricate evaluation metrics - limit the accessibility of such packages for a broad audience of potential users.
To address these issues, and combine several key functions into a single tool, we developed Merlion: a Python library for time series intelligence. Merlion provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance. It supports various time series learning tasks, including forecasting, anomaly detection, and change-point detection for both univariate and multivariate time series. This library helps solve a range of problems by providing engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs, and benchmark them across multiple time-series datasets. Instead of having to learn and deploy multiple tools, you can do it all within a single, powerful framework.
Before we dive into how Merlion works, let’s give some background and context by briefly explaining some key concepts related to what Merlion does, and why they are important.
If one or more variables or data points (e.g., periodic measurements) are changing in value over time, you have a time series; if you want to study the causes or trends of these data changes over time, welcome to the world of time series analysis. Time series analysis involves answering key questions as one strives to understand the data changing over time, such as:
Some key terms to know:
The analysis process ideally starts by determining how each variable normally changes over time (the normal pattern), so that any anomaly (a deviation from the normal pattern) can be more easily detected.
An anomaly is something out of the ordinary — a change in data readings that indicates a system is currently in an abnormal state (or about to go into it), or being influenced by a different process or outside factor.
The ability to spot an anomaly in time series data is very important in many different contexts. Examples include:
On a graph of data, an anomaly could be a sudden spike away from a trend line and then a rapid return to that trend line. However, not all spikes or deviations from a trend line are anomalies. A spike might not be indicating any behavior that is abnormal or generated by a separate process that should be uncovered. In other words, it may be normal for data to deviate from a trend line; it depends on the dataset. For example, a regularly occurring spike in some data value may be a normal feature of a system, and nothing to be concerned about.
So how can you tell if a spike is normal, or abnormal? Something to be ignored, or something to take action on? How do you spot the anomaly in the first place? Anomaly detection can be like finding a needle in a haystack -- a difficult task, especially for humans — which is why AI and machine learning are ideal for tackling this task.
Forecasting is important, especially for businesses, in order to determine how a known quantity, or variable, will change in the future. For example, businesses track their sales data, and it is crucial to be able to forecast what future sales will be, in order to take corrective action if one’s model is forecasting a dip in sales (or forecasting a change in a factor that can influence sales — e.g., weather forecasts can influence expected crop yield if you sell farm produce).
In short, forecasting lets you predict what the future will hold (at least for some data), understand the causes behind the forecast, and take corrective action now in an attempt to change that forecast for the better (that is, prevent a negative outcome in the future from occurring, or reduce the negative effect -- or make a predicted positive effect even greater).
In machine learning, a model will feature both parameters (which are determined during learning) andhyperparameters, which are set before learning even begins. Hyperparameters aretunable parameters whose values are used to control the learning process, and can directly influence how successful the model training process is.
Examples of hyperparameters include: the learning rate, the topology and size of a neural network, the k in k-nearest neighbors, the number of decision tree branches, and mini-batch size.
Analogy: if modeling test taking, one parameter would be test scores (determined during testing); hyperparameters would be the number of questions and how much time is given to finish the test (both of which are decided before the test begins).
In Merlion, automated machine learning (AutoML) is used for automated hyperparameter tuning and model selection. In other words, AutoML automates some aspects of machine learning, making life easier for researchers (one reason why Merlion is in high demand on GitHub).
Merlion enables easy-to-use ensembles that combine the outputs of multiple models to achieve more robust performance.
Benchmarking is measuring the performance of a system (for example, the models you have rapidly developed using Merlion), along one or more metrics. With Merlion, you can benchmark models across multiple time series datasets.
One of the most important tasks that organizations need to succeed at is predicting system availability. Accurately predicting the health of our systems, and identifying potential issues with those systems, is vitally important at Salesforce. Our company must run 24/7; constant uptime is in our DNA, an essential part of our brand. Hence, it’s crucial for us to predict when any of our systems might go down. Not taking steps to predict downtimes would expose the company to undue risk.
To help the company accurately predict system availability, Salesforce employs the twin time-tested time-series techniques of forecasting and anomaly detection. For example, one key to maintaining system availability is being able to detect and forecast anomalies in real-time metrics such as CPU utilization, average paging time, and request rate.
In general, anomaly detection and forecasting offer a number of business benefits, including:
Given the main problem to be solved (accurately predicting system availability and potential future outages or other negative events), organizations need to determine which method(s) should be used to solve it. In this case, the problem falls under the domain of AIOps -- and, more specifically, time series analysis.
AIOps: Applying AI to Improve IT Operations
Artificial Intelligence for Operations, or AIOps, could be thought of as the practice of applying analytics and machine learning to big data in order to automate and improve IT operations; in other words, improving the operational efficiency of a company, using AI tools and techniques. AIOps employs a range of techniques to get results, one of the most important being time series analysis.
Time Series Analysis: Studying Variables that Change over Time, to Predict Future Values
System uptime depends on certain variables being in an acceptable range at any given time — a time series task. Hence, time series analysis is crucial to predicting system availability. Since it is so important, time series analytics (one of several techniques utilized in AIOps) has been widely adopted in Salesforce’s products.
While we have identified time series analysis as key to solving the problem of accurately predicting system availability, a number of subproblems arise when employing time series techniques in the real world. Let’s look at some examples.
Practitioners want to try a variety of algorithms, but in order to use them, a significant amount of effort is required just to understand the interface for each one. It is also challenging to conform diverse datasets to a standard format for ease of benchmarking algorithms. In addition, time-series applications generally require extensive pre-processing (resampling, alignment, aggregation, normalization, etc.) or post-processing (thresholding, alert suppressions, and normalization), which are also expensive and time-consuming.
Time-series literature provides abundant metrics to evaluate the performance of models, and many of them are applicable in different application scenarios. However, some metrics are tricky to implement, and many academic evaluations may not even be applicable to real-world industrial application scenarios.
Models used in time-series forecasting and anomaly detection often require expert knowledge of complex hyperparameters in order to use them effectively. Furthermore, different models have different pros and cons, and sometimes there is a need to combine multiple models together - yet many AI/ML tools are not designed to deal with such ensemble models.
While working with different product teams, we found ourselves facing common issues across various projects. To better understand the difficulties faced in these industrial applications, let’s look at some of the steps involved in a typical application scenario:
Although many tools have been created for time-series analytics, it still takes a lot of background knowledge and substantial effort to build up a proper benchmarking environment that is compatible with most of the popular algorithms.
If only there was a way to make time series analysis easier to use, while retaining its power -- combining several standard machine learning methods into a single, accessible-to-all tool.
Wait — there is!
In order to address the issues discussed above, we collaborated with various product teams on building a tool that would be easy to use and combine several AI/ML methods in one, in order to help Salesforce achieve two broad goals:
In other words, we wanted an intelligent tool that would apply the techniques of AIOps and time series analysis to help increase upside potential (make normal operations function even better), while also forecasting downside scenarios where something might go wrong in order to help keep systems running and in a good state (reduce risk - avoid bad outcomes).
The result is the Merlion Repository: an easy-to-use machine learning library for time-series forecasting and anomaly detection. The goal of the Merlion repo is to provide a standardized experimentation platform that is accessible to anyone interested in time-series analytics. Through our Merlion repo, we simplify some of the most time-consuming and difficult time-series tasks so one can start experimenting on time series quickly and easily - often by writing just a few lines of code.
While one of the goals of the open source Merlion project is to help any organization benefit from its powerful features, we didn’t just develop this framework for the research community at large; Salesforce uses and benefits from Merlion as well. The tool has helped the company in multiple areas, and we are confident this positive outcome will be repeated at other organizations who apply this multi-function tool to their own set of problems.
Here are just a couple of examples of how Merlion benefits Salesforce:
Application 1: Improving the Tool that Improves Performance of the Salesforce Platform
Warden AIOps is an Application Performance Management (APM) platform used by developers to address performance issues on the Salesforce platform. The overall goal of Warden AIOps is to detect and fix any issues that are negatively impacting performance before they affect our customers. Benefits include improving performance and availability for Salesforce customers and reducing fatigue for operators.
Two ways in which Merlion helps Warden AIOps:
Application 2: Proactive Throttle Prediction Improves Customer Experience
When a machine’s load reaches a certain limit, we must throttle some customers’ apps to reduce the load. Currently, customers get notified only after their apps are throttled. In this ongoing new effort, with the help of Merlion, we are forecasting possible resource overflow ahead of time (before throttling is enforced), so customers can be notified early.
We’ve seen that Merlion is helping Salesforce with crucial tasks like anomaly detection and forecasting. But how does the tool actually work? Let’s look beneath the hood, to see how Merlion is structured and what its main components are.
Merlion is an end-to-end tool, designed to let users handle all of the primary tasks in the machine learning pipeline, from start to finish, as shown by its five-layer modular architecture:
For any tool to be successful (that is, likely to get adopted and used to solve real-world problems), it must not only be powerful but relatively easy to use and understand as well. This is one aspect of Merlion that makes it stand out. Not only is it powerful, but it’s designed with some key features to help make it accessible to anyone interested in time series analytics:
One can start experimenting on time series by writing just a few lines of code. For example, one can train a default model for anomaly detection and make predictions in just 10 lines of code.
Users can import many different time-series datasets -- with just a single line of code.
Through consistent APIs, users may try out different algorithms while keeping their experimentation script unchanged. For example, the model initializations are almost the same for all models.
Users can add data pre-processing transforms such as difference, exponential moving average, moving percentile, and lag transforms to an anomaly detection model.
No single model can perform well across all time series and use cases, so it’s important to provide users the flexibility to choose from a broad suite of heterogenous models. Merlion does just that, implementing many diverse models for both anomaly detection and forecasting.
The algorithms that Merlion currently supports for anomaly detection include isolation forest, random cut forest (by AWS), spectral residual (by Microsoft), dynamic baseline, ZMS, variational autoencoder, deep auto-encoding Gaussian mixture model, deep point anomaly detector, LSTM-encoder-decoder-based anomaly detector, and simple statistical threshold. We also support forecast-based anomaly detectors.
For forecasting, Merlion supports ARIMA, SARIMA, Prophet (by Facebook), ETS, vector AR, random forest, gradient boosted tree, and LSTM, as well as our own homegrown MSES smoother.
Merlion users have the ability to specify post-processing rules in the model configuration. For example, using Aggregated Alarms as the post rule for anomaly detection, this rule could be set to only fire an alarm if the raw anomaly score exceeds a specified threshold (e.g., 4-sigma, or 4 standard deviations from the mean), and suppress all subsequent alarms for a user-specified period after the first alarm (e.g., two hours). The purpose of alert suppression is to avoid generating alerts too often, which could lead to alert fatigue (example: customers who receive 10 alerts in 1 minute might start ignoring them).
By default, calibration is also enabled for all models - that is, the anomaly scores returned can be interpreted as z-scores (standard deviation units). In our current example, we set a detection threshold of 4 to specify that we would only like to generate an alarm for a 4-sigma or greater event.
One of Merlion's key features is an evaluation pipeline that simulates the live deployment of a model on historical data. This enables you to compare models on the datasets relevant to them, under conditions they may encounter in a production environment. Our evaluation pipeline proceeds as follows:
Merlion has two quick benchmark scripts for anomaly detection and forecasting, respectively. Users can evaluate any model on any dataset.
Merlion users can also easily construct ensemble models from existing models. An ensemble model combines more than one model in order to improve predictive power (the ability to predict an outcome).
One example of using ensembles to solve data science problems is the random forest algorithm, which utilizes multiple CART models (the algorithm creates multiple CART trees and combines the predictions).
The table below provides a visual overview of how Merlion's features compare to other libraries for time series anomaly detection and/or forecasting. Note how all 8 boxes (the entire set of 8 desired features listed in the far left column) are checked for Merlion, whereas none of the other tools (listed across the top of the table) have all of the boxes checked.
In short, if you want to reap the full benefits of employing all 8 of these important features in a single tool, Merlion is the only library to choose (and its popularity on GitHub shows that many people agree).
Researchers and engineers alike can rely on Merlion. By combining multiple useful functions in a single easy-to-use tool, Merlion provides a complete end-to-end solution for a wide range of machine learning time series tasks.
The Merlion framework provides several key benefits, including:
Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (links below). Connect with us on social media and our mailing list to get regular updates on this and other research projects.
examples, and the guided walkthrough here.
Huan Wang is a Research Director at Salesforce Research. He earned his Ph.D. degree in Computer Science from Yale, and received the best paper award at the Conference on Learning Theory (COLT) in 2012. At Salesforce, he works on deep learning theory, reinforcement learning, time series analytics, operational and data intelligence. Previously, he was a senior applied scientist at Microsoft AI Research, a research scientist at Yahoo, an adjunct professor at NYU’s School of Engineering teaching machine learning, and an adjunct professor at Baruch College teaching algorithm design.
Aadyot Bhatnagar is a Senior Research Engineer at Salesforce Research. He has broad research interests in machine learning, with prior works in speech, NLP, and computer vision, and he enjoys bridging the gap between research and production. Aadyot is the lead developer of the Merlion Repo.
Doyen Sahoo is a Senior Research Scientist at Salesforce Research Asia, working on AIOps research and development for enhancing operational efficiency at Salesforce. His research interests include machine learning, both fundamental and applied, online learning, and computer vision.
Wenzhuo Yang is a Senior Applied Researcher at Salesforce Research Asia, working on AIOps research and applied machine learning research, including causal machine learning, explainable AI, and recommender systems.
Steven C.H. Hoi is Managing Director of Salesforce Research Asia and oversees Salesforce's AI research and development activities in APAC. His research interests include machine learning and a broad range of AI applications.
Caiming Xiong, a VP / Managing Director of AI Research at Salesforce, leads the effort to build state-of-the-art AI technologies, publish in top academic conferences, innovate, collaborate, and embed our work across Salesforce clouds to accelerate the building of AI products.
Donald Rose is a Technical Writer at Salesforce AI Research. Specializing in content creation and editing, Dr. Rose works on multiple projects, including blog posts, video scripts, news articles, media/PR material, social media, writing workshops, and more. He also helps researchers transform their work into publications geared towards a wider audience.
The Merlion repository is the outcome of a collaboration between Salesforce Research and several Salesforce product teams, including Monitoring Cloud, Warden AI, and Service Protection. Here is the full list of authors: Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, and Huan Wang.
We would also like to thank Denise Perez, Donald Rose, Gang Wu, Feihong Wu, Vera Serdiukova, Zachary Taschdjian, and MJ Jones for their help in setting up the webpage and UX designs.
Prophet: https://facebook.github.io/prophet/. Sean J. Taylor and Benjamin Letham, Forecasting at Scale.
Random Cut Forest: https://github.com/aws/random-cut-forest-by-aws. Guha, S., Mishra, N., Roy, G., & Schrijvers, O. (2016, June). Robust random cut forest-based anomaly detection on streams. In International conference on machine learning (pp. 2712-2721).
Isolation Forest: F. Liu, K. Ting, and Z. Zhou. 2008 Eighth IEEE International Conference on Data Mining, page 413--422. IEEE, (2008).