HIVE: Harnessing Human Feedback for Instructional Visual Editing

HIVE is accepted to CVPR 2024. Other authors include: Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong We have seen the success of ChatGPT, which incorporates human feedback to align text generated by large language models to human preferences. Is it possible to align

17 Jun 2024 •

BannerGen: A Library for Multi-Modality Banner Generation

Background Graphic layout designs serve as the foundation of communication between media designers and their target audience. They play a pivotal role in organizing various visual elements, including rendered text, logos, product images, calls to action (such as buttons), and background textures/images. The arrangement of these elements is the

06 Dec 2023 •

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

Other authors include: Can Qin, Stefano Ermon, Yun Fu GlueGen was accepted by ICCV. In the rapidly advancing field of text-to-image synthesis, the remarkable progress in generating lifelike images from textual prompts has been evident. However, a significant challenge remains: how can we seamlessly integrate powerful pre-trained text encoders into

29 Sep 2023 •

A Leap Forward in 3D Understanding: The ULIP and ULIP-2

TL;DR: Imagine a world where machines comprehend 3D objects just as humans do. The ULIP (CVPR2023) and ULIP-2 projects, backed by Salesforce AI, are making this a reality by revolutionizing 3D understanding. ULIP uniquely pre-trains models with 3D point clouds, images, and texts, aligning them into a unified representation

23 May 2023 •