Aligning Diffusion Models to Human Preferences
TLDR Learning from human preferences, specifically Reinforcement Learning from Human Feedback (RLHF) has been a key recent component in the development of large language models such as ChatGPT or Llama2. Up until recently, the impact of human feedback training on text-to-image models was much more limited. In this work, Diffusion-DPO,
08 Jan 2024 • Bram Wallace • #reinforcement-learning