I’ve written a lot in recent months about what I call Large Action Models, or LAMs—a more active, autonomous variation on LLMs that don’t merely generate content like text or images but accomplish entire tasks and even participate in workflows, whether alongside people or on their own. This year, at Dreamforce 2023, that vision took a big step towards reality with the introduction of Einstein Copilot, Salesforce’s conversational AI assistant being rolled out across the entire Salesforce platform and ready to integrate into just about everything customers do.
Out of the box, it’s hard not to be impressed by Einstein Copilot. It’s built from the ground up to drive productivity in a safe way by assisting users across workflows of almost every kind. It handles questions posted in natural language and provides relevant and trustworthy answers drawn from secure, proprietary company data. It’s a clear picture of where I believe AI is going in the enterprise: a single, trusted interface, designed around everyday human interactions, capable of helping across a wide range of tasks. It presents the power of AI in a way that ensures the technology fits the needs of the business, rather than the other way around, and I have no doubt it’ll change the way customers work. And LAMs, as their flexibility and capabilities evolve, will take this already powerful foundation to the next level.
So what’s next?
Much of the recent conversation in generative AI has revolved around the size and architecture of the models that power LLMs and LAMs alike. And as companies like OpenAI continue to push the limits on scale, with parameter counts well into the hundreds of billions, it’s not hard to conclude that bigger is always better. Indeed, large models often do boast performance that would be difficult or impossible to achieve any other way, and impressive, often uncannily sophisticated behavior continues to emerge as model sizes increase, suggesting significant benefits await from the strategy of more and more scale. Still, there’s a lot more to the story.
For all the headlines it generates, the pursuit of ever-larger models is far from a perfect strategy. Most obviously, today’s biggest models suffer from eye-watering compute costs, keeping them well out of reach for many businesses. And even those who can afford to deploy them must accept that the high-quality output they promise can be achingly slow to generate. Additionally, some of the biggest problems we still face in terms of trust, safety, toxicity, and claims of ownership like copyright, stem from the massive, globally sourced datasets these hyperscale models depend on.
These downsides make smaller models increasingly attractive in a number of domains. They’re of course comparatively cost-effective and can be tuned to run at blazing speeds. Today’s purpose-built LLMs can even be run entirely at the edge in some cases, including an end user’s mobile device. And because they require less to train, customers can play a more active, curatory role in preparing their datasets, allowing for great strides to be made in terms of the quality, safety, and even legal status of the content it includes.
Perhaps most profound is the fact that even the quality of their output can compete impressively with their bigger cousins by focusing on narrower domains. Remember, after all, that models like ChatGPT are essentially designed to be everything to everyone, helping with homework, dinner recipes, answering questions about science, technology, history, and pop culture, and, of course, rewriting Macbeth in the style of Jay-Z. In contrast, generative AI for the enterprise can and should focus on far smaller, more relevant problem domains. This is as clear a win-win as one can imagine: it means a lower barrier to entry without compromising on output quality.
But even small models can deliver big solutions—we just have to think about scale differently. Instead of making models themselves bigger, what happens when multiple models, each designed with a specific goal and trained on a manageably curated, well-vetted, and proprietary dataset, are woven together in service of a single, higher-level goal? What if AI agents like Einstein Copilot could be combined—orchestrated—just as multiple humans can work as a team to do more than they could as individuals? Consider a restaurant, for example—an organization that’s only possible because a team works together, each member with their own skills and focus area: servers taking orders, chefs preparing food, a receptionist fielding reservations and orders, a driver making deliveries. What might it be like for LAMs to organize in a similar fashion?
This idea of orchestration is something I’ve been thinking about a lot lately, and I see it as one of the most exciting, but also practical techniques to bring about a future of useful, autonomous agents in a safe and productive way. Best of all, orchestration means that even the most ambitious solutions can remain transparent and knowable to the people who create them and work alongside them. Remember, the scale in this case comes not from ever-larger neural networks—and all the mystery that lay within them—but from separate, clearly-defined components organized in ways meaningful to humans. For instance, instead of training one giant model to record customer meeting notes, draw inferences from the results, update CRM records accordingly, and then send out follow-up messages all on its own, each of these tasks could be assigned to an individually trained model. In fact, having spent much of my research career in robotics, I can’t help but look even further over the horizon to imagine such orchestration happening in real-world spaces, with physically embodied models working together to solve tasks of all kinds, alongside humans in factories, offices, hospitals, and maybe even restaurants. But as lofty as that sounds—and it’s a long-term vision, admittedly—the present-day potential of orchestration is already enormous.
So let’s talk about the benefits. For one thing, orchestration spares us the difficulty of assembling a dataset big enough to turn a single model into such a flexible, domain-spanning agent—along with the risk that comes from throwing such large quantities of widely varying data into a single training set. Additionally, each model can be further fine-tuned, with reinforcement learning from human feedback (RLHF). The result is a system in which each component—a separate LAM, like Einstein Copilot—is hyper-specialized for a crucial but manageable step in a larger task.
And when something does go wrong, either during debugging or even in production—problems can be identified more easily, in terms of a single, purpose-built model, allowing them to be understood and solved with far greater confidence. Even serious faults can be handled in a more robust, modular fashion; with multiple models working together, failures are more likely to be contained and easily isolated, with far greater opportunities for continuity when individual components fail.
More importantly, it elevates the creation of enterprise AI models from a purely technical task into one that models a business process in terms human stakeholders can understand. Just as any good manager knows instinctively how to break a problem down for a team of people to face, experts in AI orchestration may soon boast similar instincts for breaking a problem down for a collection of purpose-built models. An especially exciting aspect of this vision is that it points towards a new kind of skill—what one might even call an emerging art—that I look forward to seeing develop in enterprises. Experts in LAM orchestration will think at a high level, focusing squarely on the needs of their enterprise as a business, not merely a technology platform, and using that insight to break large, meaningful tasks—the kind that deliver real, measurable value—into a sequence of smaller ones that a “team” of LAMs can solve together.
Their work will intersect with infrastructure, ensuring these teams of models are deployed safely and efficiently, data science, working to collect unique datasets that solve smaller, less ambiguous problems, and human interface design, in the hopes that the result will work gracefully with people and respect existing workflows. In other words, orchestration experts may become the new face of enterprise AI—less focused on the nuts and bolts of neural networks, and more on ways to build powerful, robust systems of which those networks are only one component among many.
In fact, it’s my ultimate hope that this skill will be neither rare nor exclusive, but commonplace, turning the orchestration of LAMs into powerful, personalized solutions that play a growing role in our professional lives. The barrier may be lowered even further as marketplaces emerge to bring orchestrated Copilot-like LAM solutions to the world, delivering the power of generative AI at an amazing scale, all through plug-and-play simplicity. Some will use such marketplace solutions directly, making the power of LAM orchestration an off-the-shelf possibility. Others will treat them as modules to be combined with others—perhaps a blend of additional purchases or custom creations of their own—to compose solutions at whatever scale they need, from the casual and compact to the sprawling ambitious. But in all cases, what excites me most is the idea of generative AI being shaped less by an elite niche of technology experts and more by the creativity and vision of professionals in every field.
This is my vision for the future of work, in fact: a world in which AI supports human skill at ever-larger scales by enabling us to think at ever-higher levels, simplifying everything we do while preserving the creativity, style, and perspective that makes us unique.
The road toward any new vision is usually an incremental one, and LAMs are no exception. But if recent years are any indication, each step will prove nevertheless to be transformative all on its own. From their earliest incarnations, LLMs showed a rare potential for disruption and innovation—the kind we only see once or twice in a generation—and the pace has only increased since. Assistive agents like Einstein Copilot raise the bar even higher, with intuitive interfaces, robust trust and safety features, and seamless integration into traditional workflows. And as such agents are connected in more and more sophisticated ways—orchestration, as I like to call it—I believe the possibilities will simply boggle the mind. These are truly exciting times, and there’s nowhere I’d rather spend them than Salesforce Research.
Learn more about autonomous agents: https://developer.salesforce.com/blogs/2023/10/an-introduction-to-autonomous-agents
Special thanks to Alex Michael, Peter Schwartz, and Sanjna Parulekar for their contributions to the writing of this piece.