CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

TL; DR: We find that current self-supervised learning approaches suffer from poor visual grounding and receive improper supervisory signal when trained on complex scene images. We introduce CAST to improve visual grounding during pretraining and show that it yields significantly better transferable features. Self-supervised learning and its grounding problemSelf-Supervised Learning

09 Dec 2020 •