CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

> TL; DR: We find that current self-supervised learning approaches suffer from poor visual grounding and receive improper supervisory signal when trained on complex scene images. We introduce CAST to improve visual grounding during pretraining and show that it yields significantly better transferable features. Self-supervised learning and its grounding problem

09 Dec 2020 •