Junnan Li - Salesforce AI

CodeT5+: Open Code Large Language Models

TL;DR: CodeT5+ is a new family of open code large language models (LLMs) with improved model architectures and training techniques. CodeT5+ achieves the state-of-the-art performance among the open-source LLMs on many challenging code intelligence tasks, including zero-shot evaluation on the code generation benchmark HumanEval. Background: Code LLMs Large language

20 May 2023 • #codet5+

BLIP-2: Scalable Pre-training of Multimodal Foundation Models for the World's First Open-source Multimodal Chatbot

17 Mar 2023 •

Meet LAVIS: A One-stop Library for Language-Vision AI Research and Applications

TL;DR: LAVIS (short for LAnguage-VISion) is an open-source deep learning library for language-vision research and applications, offering comprehensive support for a wide range of tasks, datasets, and state-of-the-art models. Featuring a unified interface and modular design, it’s easy to use off-the-shelf and to extend with new capabilities. With

20 Sep 2022 • #LAVIS

ALPRO: Understanding Video and Language by Aligning Visual Regions and Text Entities

TL;DR: We propose ALPRO, a new video-and-language representation learning framework which achieves state-of-the-art performance on video-text retrieval and video question answering by learning fine-grained alignment between video regions and textual entities via entity prompts. For more background (a review of key concepts used in this post), please see the

31 May 2022 • #ALPRO

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

TL;DR: BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. Background For a review of some terms and definitions used in this blog, see our Appendix. Vision and language, two of the most fundamental methods

23 Feb 2022 • #BLIP

Align before Fuse (ALBEF): Advancing Vision-language Understanding with Contrastive Learning

> TL; DR: We propose a new vision-language representation learning framework which achieves state-of-the-art performance by first aligning the unimodal representations before fusing them. Vision and language are two of the most fundamental channels for humans to perceive the world. It has been a long-standing goal in AI to build

19 Jul 2021 • #vision and language

CoMatch: Advancing Semi-supervised Learning with Contrastive Graph Regularization

> TL; DR: We propose a new semi-supervised learning method which achieves state-of-the-art performance by learning jointly-evolved class probabilities and image representations. What are the existing semi-supervised learning methods? Semi-supervised learning aims to leverage few labeled data and a large amount of unlabeled data. As a long-standing and widely-studied topic

23 Nov 2020 •

MoPro: Webly Supervised Learning with Momentum Prototypes

> TL; DR: We propose a new webly-supervised learning method which achieves state-of-the-art representation learning performance by training on large amounts of freely available noisy web images. Deep neural networks are known to be hungry for labeled data. Current state-of-the-art CNNs are trained with supervised learning on datasets such as

17 Sep 2020 • #webly supervised learning

Prototypical Contrastive Learning: Pushing the Frontiers of Unsupervised Learning

Prototypical Contrastive Learning unifies clustering and contrastive self-supervised learning to push the frontiers of unsupervised learning.

15 May 2020 • #artificial intelligence