Meet LAVIS: A One-stop Library for Language-Vision AI Research and Applications

TL;DR: LAVIS (short for LAnguage-VISion) is an open-source deep learning library for language-vision research and applications, offering comprehensive support for a wide range of tasks, datasets, and state-of-the-art models. Featuring a unified interface and modular design, it’s easy to use off-the-shelf and to extend with new capabilities. With

20 Sep 2022 • #LAVIS

ALPRO: Understanding Video and Language by Aligning Visual Regions and Text Entities

TL;DR: We propose ALPRO, a new video-and-language representation learning framework which achieves state-of-the-art performance on video-text retrieval and video question answering by learning fine-grained alignment between video regions and textual entities via entity prompts. For more background (a review of key concepts used in this post), please see the

31 May 2022 • #ALPRO