reinforcement-learning - Salesforce AI

Aligning Diffusion Models to Human Preferences

TLDR Learning from human preferences, specifically Reinforcement Learning from Human Feedback (RLHF) has been a key recent component in the development of large language models such as ChatGPT or Llama2. Up until recently, the impact of human feedback training on text-to-image models was much more limited. In this work, Diffusion-DPO,

08 Jan 2024 • Bram Wallace • #reinforcement-learning

WarpDrive v2 Release Supports Numba to Simplify Machine Learning Workloads and Make Building Simulations Easier on NVIDIA GPUs

TL;DR: Deep reinforcement learning (RL), a powerful learning framework to train AI agents, can be slow as it requires repeated interaction with a simulation of the environment. Our original WarpDrive accelerates multi-agent deep RL on NVIDIA GPUs, enabling 10-100x speedups compared to alternative CPU+GPU implementations of multi-agent simulations.

02 Nov 2022 • Tian Lan • #WarpDrive

AI Coding with CodeRL: Toward Mastering Program Synthesis with Deep Reinforcement Learning

TL;DR: CodeRL is a new framework for program synthesis through holistic integration of pretrained language models and deep reinforcement learning. By utilizing unit test feedback as part of model training and inference, and integrating with an improved CodeT5 model, CodeRL achieves state-of-the-art results on competition-level programming tasks. The following

19 Jul 2022 • Henry Hung Le • #reinforcement-learning

Turbocharge Multi-Agent Reinforcement Learning with WarpDrive and PyTorch Lightning

TL;DR: WarpDrive is a flexible, lightweight, easy-to-use end-to-end reinforcement learning (RL) framework; enables orders-of-magnitude faster training on a single GPU. PyTorch Lightning enables you to modularize experimental code, and build production-ready workloads fast. Together, they can help significantly accelerate multi-agent RL R&D. Reinforcement Learning: Agents Learn by

20 May 2022 • Sunil Srinivasa • #WarpDrive

Science Advances Publishes AI Economist Research on Improving Tax Policies With Reinforcement Learning

TL;DR: The AI Economist, a reinforcement learning (RL) system, learns dynamic tax policies that optimize equality along with productivity in simulated economies, outperforming alternative tax systems. We have now expanded this research, which is being published in the interdisciplinary scientific journal Science Advances. Humans or AI: Which Can Design

05 May 2022 • Stephan Zheng • #AI Economist

WarpDrive: Extremely Fast Reinforcement Learning on an NVIDIA GPU

tldr: WarpDrive is an open-source framework to do multi-agent RL end-to-end on a GPU. It achieves orders of magnitude faster multi-agent RL training with 2000 environments and 1000 agents in a simple Tag environment. WarpDrive provides lightweight tools and workflow objects to build your own fast RL workflows. Check out

01 Sep 2021 • Stephan Zheng • #machine-learning

Ethics in AI research papers and articles

This is my obsessively curated list of research papers and articles on ethics in AI that I have been collecting over the years. Ones in bold are those that I refer back to and found particularly useful. Let me know if I am missing your favorites.

20 Jan 2019 • Kathy Baxter • #ethics

Blog