Actions Speak Louder Than Words: Introducing xLAM, Salesforce’s family of Large Action Models

06 Sep 2024
5 min read
Actions Speak Louder Than Words: Introducing xLAM, Salesforce’s family of Large Action Models

Introduction

Imagine asking your CRM system, "Can you write an email to a customer who’s having trouble logging into their account?" In today's AI-driven world, Large Language Models (LLMs) can handle such requests effortlessly, crafting emails, designing visuals, absorbing information, and even venturing into coding. The expectations from our CRM systems have evolved dramatically, and LLMs are becoming increasingly versatile. But there's a new wave on the horizon—one that promises even more autonomy and efficiency.

Enter Large Action Models (LAMs). Unlike LLMs, which are fantastic at generating text and responses, LAMs go a step further by proactively managing entire workflows without needing explicit instructions for each task. Imagine a CRM system that not only knows what you want, but also anticipates your needs, automating processes, and making informed decisions on your behalf. This is where LAMs shine, drawing from their roots in fields like robotics, autonomous vehicles, and gaming AI, where understanding context and making real-time decisions are crucial.

At Salesforce AI Research, we're at the forefront of this exciting development. Recognizing the rapid pace of advancements in the AI landscape, we've introduced xLAM, our family of in-house Large Action Models, designed for function calling, reasoning, and planning. These models are designed to streamline and simplify the integration of AI into your workflows, reducing the complexity often associated with LLMs. Before we dive into exploring xLAM, let's first understand Large Action Models and how they work.

Large Action Models vs. Large Language Models

Screenshot 2024-08-05 at 3.02.46 PM.png

As a lot of us know, Large Language Models are designed to understand and generate human-like text. They’re trained on vast datasets and can really perform a wide range of language related tasks. Think of an LLM like a top chef that devises mouthwatering recipes and offers detailed instructions on how to create a gourmet meal.

Large action models, on the other hand are designed to make decisions and perform actions in various environments. Think of a sous chef that not only helps with the recipe, but handles tasks like cooking, chopping, mixing and making sure the dish is prepared exactly as its needed without you lifting a single finger. In the realm of AI, Language Action Models (LAMs) are a specialized subset of Large Language Models (LLMs) designed primarily for generating actions, commonly through function calling. These models are making waves in the realm of CRM where understanding the context and making appropriate decisions on behalf of the company becomes crucial. In CRM terms, LAMs go beyond just understanding and generating content—they handle the nitty-gritty of execution. They automate workflows, manage tasks, and ensure everything runs smoothly. If the LLM provides the recipe, the LAM makes sure the ingredients are chopped, mixed, and cooked to perfection, delivering results without you needing to lift a finger.

Meet xLAM, Our Family of Large Action Models

xLAM, Salesforce AI Research’s family of in-house Large Action Models have become quite a breakthrough in this last month. This innovative family of models has swiftly risen to the top, boasting #2 on the Berkeley Leaderboards for Function calling V1 (cutoff date 08/12/2024), surpassing even some variants of GPT-4. The effectiveness of xLAM-1B hinges on the superior quality and variety of its training data. The APIGen pipeline utilizes 3,673 executable APIs spanning 21 categories, each undergoing a thorough three-step verification process that includes format checks, actual function executions, and semantic verification to ensure data integrity and relevance. Let's delve deeper into the specific models within the xLAM family and explore their unique capabilities and applications.

Tiny (xLAM-1B)

Known as our “Tiny Giant”, this compact version of xLAM features 1B parameters. Given the model’s compact size, this is the most suitable for on-device applications where larger models are more impractical.

Small (xLAM-7B)

A 7B model designed for swift academic exploration with limited GPU resources.

Medium (xLAM-8x7B)

An 8x7B mixture-of-experts model, ideal for industrial applications striving for a balanced combination of latency, resource consumption, and performance.

Large (xLAM-8x22B)

This is a large mixture-of-experts model if you have great computational resources and want to pursue the best performance.

Screenshot 2024-08-06 at 4.32.45 PM.png

Large Action Models: The Power of Function Calling

As we mentioned above, LAMs are designed primarily for generating actions, commonly through function calling. A vivid example of this is seen in how these models can enhance the efficiency of sales representatives. Consider a scenario where a sales rep might need to cancel an order. Instead of navigating through multiple systems, the rep could simply ask an AI copilot to handle the task. The AI, powered by a LAM, understands the request, determines that the order management system (OMS) is the relevant application, and executes the necessary function to cancel the order.

Our xLAM model series takes this capability to the next level, particularly optimized for function calling. On the Berkeley Function Calling Leaderboard V1, xLAM-7B(fc) outperforms major models including OpenAI’s GPT-4 and Claude-3-Opus despite being significantly smaller and more cost-effective. In a domain where accuracy and trust are paramount—given that these transactions often involve updating critical data—the xLAM model series not only streamlines operations but also ensures actions are executed with precision, showcasing the robust potential of LAMs in real-world applications.

xLAM in a Multi Agent World

This functionality lays the groundwork for robust autonomous agent systems, marking a significant advancement in how AI can streamline and enhance business operations. The power of xLAM is not limited to a single agent. In fact, xLAM can be used to power the decision making and actions of many collaborative agents. With many highly specialized agents powered by xLAM, increasingly complex tasks can be achieved autonomously.

To learn more about this cutting-edge direction, sign up for our Dreamforce session.

Conclusion

In conclusion, the evolution from Large Language Models (LLMs) to Large Action Models (LAMs) marks a significant leap in AI capabilities within CRM systems. Salesforce AI Research's introduction of xLAM, a pioneering family of Large Action Models, underscores this shift towards more autonomous, efficient, and context-aware AI tools. By handling complex workflows and decision-making processes, xLAM not only enhances operational efficiency but also redefines user interactions with CRM systems. As we continue to explore and expand the potentials of xLAM, the future of CRM looks promisingly more intuitive and automated, promising to transform business operations and customer relationships profoundly.

Explore More

Acknowledgments

Full Author List: Jianguo Zhang∗, Tian Lan∗, Ming Zhu∗, Zuxin Liu∗, Thai Hoang∗, Shirley Kokane†, Weiran Yao†, Juntao Tan, Akshara Prabhakar, Zhiwei Liu, Haolin Chen, Yihao Feng,Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong