MSVD Script Tutorial - 搜索 News

BiSeR-LMA: A Bidirectional Semantic Reasoning and Large Model Enhancement Approach for Text ...

Abstract: Video, as an information carrier, provides a vast amount of important information to people. Therefore, the method of obtaining video becomes particularly important, which drives the ...

IEEE

DRGI: Disentangled Representation Graph Infomax for Video Retrieval

Abstract: Vision-language models pretrained on image-text pairs have demonstrated strong performance in text-to-video retrieval through contrastive learning. However, videos contain much richer ...

GitHub

Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

ViTiS consists of a frozen video encoder, a visual mapping network, a frozen text embedding layer, a frozen language model and a frozen classifier head. Given input video frames and text, video ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

BiSeR-LMA: A Bidirectional Semantic Reasoning and Large Model Enhancement Approach for Text ...

DRGI: Disentangled Representation Graph Infomax for Video Retrieval

Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

今日热点