BannerGen: A Library for Multi-Modality Banner Generation

Background Graphic layout designs serve as the foundation of communication between media designers and their target audience. They play a pivotal role in organizing various visual elements, including rendered text, logos, product images, calls to action (such as buttons), and background textures/images. The arrangement of these elements is the

06 Dec 2023 •

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

Other authors include: Can Qin, Stefano Ermon, Yun Fu GlueGen was accepted by ICCV. In the rapidly advancing field of text-to-image synthesis, the remarkable progress in generating lifelike images from textual prompts has been evident. However, a significant challenge remains: how can we seamlessly integrate powerful pre-trained text encoders into

29 Sep 2023 •

Mask-free OVIS: An Open-Vocabulary Instance Segmentation Mask Generator

Authors: Vibashan Vishnukumar Sharmini, Ning Yu, Ran Xu Have you ever wondered how long it takes for a human annotator to annotate a dataset like COCO? MORE THAN A YEAR. Not to mention, even training a detection model on this dataset would only equip it to detect those specific 80

16 Jun 2023 •

A Leap Forward in 3D Understanding: The ULIP and ULIP-2

TL;DR: Imagine a world where machines comprehend 3D objects just as humans do. The ULIP (CVPR2023) and ULIP-2 projects, backed by Salesforce AI, are making this a reality by revolutionizing 3D understanding. ULIP uniquely pre-trains models with 3D point clouds, images, and texts, aligning them into a unified representation

23 May 2023 •