BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
TL;DR: BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. Background For a review of some terms and definitions used in this blog, see our Appendix. Vision and language, two of the most fundamental methods
23 Feb 2022 • Junnan Li • #BLIP