共计 169 篇文章
2023
论文笔记 mPLUG-2:A Modularized Multi-modal Foundation Model Across Text, Image and Video Image Captioning常用指标CIDEr原理 论文笔记 Self-critical Sequence Training for Image Captioning 学习笔记 Gumbel-Softmax分布 论文笔记 两篇分析多头注意力的论文 论文笔记 XCLIP Expanding Language-Image Pretrained Models for General Video Recognition 论文笔记 BLIP-2 Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 论文笔记 Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration 论文笔记 mPLUG Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections 论文笔记 X-VLM Multi-Grained Vision Language Pre-Training Aligning Texts with Visual Concepts