共计 174 篇文章
2023
多模态语言模型发展观察 论文笔记 Vision Transformers are Parameter-Efficient Audio-Visual Learners 论文笔记 两篇关于Audio-Visual定位的论文 论文笔记 CoCa 与 VideoCoCa 论文笔记 STOA-VLP:Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training 论文笔记 mPLUG-2:A Modularized Multi-modal Foundation Model Across Text, Image and Video Image Captioning常用指标CIDEr原理 论文笔记 Self-critical Sequence Training for Image Captioning 学习笔记 Gumbel-Softmax分布 论文笔记 两篇分析多头注意力的论文