学习笔记
43
论文笔记
39
论文笔记 Towards Diverse Paragraph Captioning for Untrimmed Videos
多模态语言模型发展观察
论文笔记 Vision Transformers are Parameter-Efficient Audio-Visual Learners
论文笔记 两篇关于Audio-Visual定位的论文
论文笔记 CoCa 与 VideoCoCa
论文笔记 STOA-VLP:Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
论文笔记 mPLUG-2:A Modularized Multi-modal Foundation Model Across Text, Image and Video
论文笔记 Self-critical Sequence Training for Image Captioning
论文笔记 两篇分析多头注意力的论文
论文笔记 XCLIP Expanding Language-Image Pretrained Models for General Video Recognition
More...