共计 5 篇文章
2023
论文笔记 CoCa 与 VideoCoCa 论文笔记 STOA-VLP:Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training 论文笔记 mPLUG-2:A Modularized Multi-modal Foundation Model Across Text, Image and Video 论文笔记 mPLUG Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections 论文笔记 X-VLM Multi-Grained Vision Language Pre-Training Aligning Texts with Visual Concepts