分类 - 论文笔记 - Kamino's Blog

论文笔记 LiT Zero-Shot Transfer with Locked-image text Tuning 06-24

论文笔记 Image Captioners Are Scalable Vision Learners Too 06-24

论文笔记 Towards Diverse Paragraph Captioning for Untrimmed Videos 06-06

多模态语言模型发展观察 05-08

论文笔记 Vision Transformers are Parameter-Efficient Audio-Visual Learners 04-28

论文笔记两篇关于Audio-Visual定位的论文 03-27

论文笔记 CoCa 与 VideoCoCa 03-21

论文笔记 STOA-VLP：Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training 03-20

论文笔记 mPLUG-2：A Modularized Multi-modal Foundation Model Across Text, Image and Video 03-18

论文笔记 Self-critical Sequence Training for Image Captioning 03-16