TL;DR: Deep reinforcement learning (RL), a powerful learning framework to train AI agents, can be slow as it requires repeated interaction with a simulation of the environment. Our original WarpDrive accelerates multi-agent deep RL on NVIDIA GPUs, enabling 10-100x speedups compared to alternative CPU+GPU implementations of multi-agent simulations. However, users had to write simulation environments in NVIDIA CUDA, which requires C expertise for coding and debugging. Solution: our newly released WarpDrive v2 supports running simulations in Numba, a drop-in replacement for NumPy. Users can now implement simulations much faster, using higher-level coding for rapid prototyping, and also have the flexibility to use CUDA for highest performance.
Multi-agent systems, particularly those with multiple interacting AI agents, are a frontier for AI research and applications. They are key to solving many engineering and scientific challenges in economics, self-driving cars, robotics, and many other fields.
Deep reinforcement learning (RL) is a powerful learning framework to train AI agents. For example, deep RL agents have mastered Starcraft [1], successfully trained robotic arms [2], and effectively recommended economic policies [3,4].
Multi-agent deep RL (MADRL) experiments can take days or even weeks, especially when a large number of agents must be trained, since MADRL requires repeatedly running multi-agent simulations and training agent models. This takes a lot of time because MADRL implementations often combine CPU-based simulations with GPU deep learning models, which introduces many performance bottlenecks due to the poor parallelism capabilities of CPU computations and heavy data transfers between CPUs and GPUs.
To accelerate MADRL research and engineering, we built WarpDrive – an open-source framework for end-to-end MADRL on one or more GPUs. As its Star-Trek-inspired name implies, speed is the “Prime Directive” for WarpDrive, and it succeeds with flying colors.
WarpDrive achieved impressive acceleration, as its design enables on the order of 10-100x speedups compared to alternative CPU+GPU implementations of multi-agent simulations. In one example – a simple Tag environment – WarpDrive achieves orders of magnitude faster multi-agent RL training with 2000 environments and 1000 agents [5, 6]. In the COVID-19 economic simulation [4] with 60 environments, as shown in the figure below, WarpDrive achieves a 24x throughput boost running on a single NVIDIA A100 Tensor Core GPU versus n1-standard-16 (16 CPUs) [7].
Want to build your own fast RL workflows? WarpDrive enables you to quickly develop new MADRL projects, providing tools and workflow objects to do it. For your own simulation, you need to implement the simulation step function in CUDA C.
The original version of WarpDrive did speed up MADRL, letting users address the “slow RL” problem by writing a CUDA C version of their simulation environment. However, CUDA programming relies on C expertise that not all developers have.
Is there a way to make programming simulations in WarpDrive easier and/or faster? Can we let users code in a higher-level language to create these simulations? Yes and Yes – thanks to our new version of WarpDrive!
We’ve released the second major version of WarpDrive, WarpDrive v2, which supports running simulations in Numba. Numba is a Python package – a drop-in replacement for NumPy – so users can now take advantage of the power of Python and NumPy to build their simulations.
Adding Numba to WarpDrive brings some significant benefits:
WarpDrive v2 provides a single API for both backends – CUDA C and Numba. All of WarpDrive's functions work with both Numba and CUDA, and users can switch back and forth between the two as they build simulations.
To turn on the Numba simulation, users just need to specify env_backend=“numba”. WarpDrive will then automatically link the simulation built by users in Numba with the Numba-compatible core residing inside WarpDrive to complete the end-to-end simulation and training, as in the example shown below.
env_wrapper = EnvWrapper(
env_obj=TagContinuous(**run_config["env"]),
num_envs=run_config["trainer"]["num_envs"],
env_backend="numba",
)
trainer = Trainer(env_wrapper, run_config, policy_tag_to_agent_id_map)
trainer.train()
We encourage you to visit the following webpages to explore some tutorials, examples, code, and related documentation:
You may be wondering: if users can code in either Numba or CUDA, which is better, and when? Is it an advantage in WarpDrive v2 that users can "go back and forth" between CUDA C and Numba? Are there cases where a user might want to code in lower-level CUDA C instead of Numba, or would everyone always use Numba since it involves higher-level programming?
In general, we would say yes: the ability to go back and forth between the two (having the option of using either CUDA or Numba) is an advantage. Here's why:
So, if you’re developing a complicated simulation, and experimenting with different simulation designs, it’s probably best to start with Numba. But if you’ve moved from the development and experimentation phase to the production phase, and this simulation will be used again and again, consider converting to CUDA C in the end, to ensure it’s optimized for speed.
In short, Numba is generally faster for development and experimentation, while CUDA C provides better performance, so "back and forth" seems like an ideal feedback loop, although it depends on many factors. The above “Numba or CUDA” debate is meant to stimulate thinking about the various factors, not to be the final word; each tool has its advantages, and can be the best choice for a given problem setup. The important thing is that we provide CUDA C and Numba to cover both low-level and high-level programming; for many users, high-level programming is more familiar and will speed development, while others will prefer CUDA C to make their code run as fast as possible. With WarpDrive v2, users can enjoy the best of both worlds.
The release of WarpDrive v2 is significant because it improves on the previous version in several ways:
Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (links below). Connect with us on social media and our website to get regular updates on this and other research projects.
Tian Lan is a Senior Research Scientist at Salesforce Research, working for both the AI Operations team and the AI Economist team. For AI Economist, his main focus is on building and scaling multi-agent reinforcement learning systems. He has extensive experience building up large-scale and massively parallel computational simulation platforms for academia, automated trading, and the high-tech industry.
Stephan Zheng (www.stephanzheng.com) leads the AI Economist team at Salesforce Research, working on deep reinforcement learning and AI simulations to design economic policies.
Donald Rose is a Technical Writer at Salesforce AI Research, specializing in content creation and editing for blog posts, video scripts, newsletters, media/PR material, workshops, and more.
This release was made possible in part by NVIDIA, and we wish to thank them for their significant help and support. Special thanks to Brenton Chu for being the first to write a Numba simulation, which motivated us to update the entire codebase and make it compatible with Numba. We thank our partners for their generous support and can’t wait to see what you create with this release!