vision-language generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

TL;DR: BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. Background For a review of some terms and definitions used in this blog, see our Appendix. Vision and language, two of the most fundamental methods

23 Feb 2022 • Junnan Li • #BLIP

Blog

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation