MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
We are excited to open-source šMINT-1T, the first trillion token multimodal interleaved dataset and a valuable resource for the community to study and build large multimodal models.
24 Jul 2024 •