共计 168 篇文章
2023
论文笔记 FILIP Fine-grained Interactive Language-Image Pre-training 论文笔记 LiT Zero-Shot Transfer with Locked-image text Tuning 论文笔记 Image Captioners Are Scalable Vision Learners Too 论文笔记 Towards Diverse Paragraph Captioning for Untrimmed Videos 多模态语言模型发展观察 论文笔记 Vision Transformers are Parameter-Efficient Audio-Visual Learners 论文笔记 两篇关于Audio-Visual定位的论文 论文笔记 CoCa 与 VideoCoCa 论文笔记 STOA-VLP:Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training 论文笔记 mPLUG-2:A Modularized Multi-modal Foundation Model Across Text, Image and Video