Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval

lib:422bb62e3ec3d501 (v1.0.0)

Vote to reproduce this paper and share portable workflows   1 
Authors: Niluthpol Chowdhury Mithun,Juncheng Li,Florian Metze,Amit K. Roy-Chowdhury
Where published: ICMR 2018 6
Document:  PDF  DOI 
Artifact development version: GitHub
Abstract URL: https://dl.acm.org/citation.cfm?id=3206064


Constructing a joint representation invariant across different modalities (e.g., video, language) is of significant importance in many multimedia applications. While there are a number of recent successes in developing effective image-text retrieval methods by learning joint representations, the video-text retrieval task, in contrast, has not been explored to its fullest extent. In this paper, we study how to effectively utilize available multi-modal cues from videos for the cross-modal video-text retrieval task. Based on our analysis, we propose a novel framework that simultaneously utilizes multimodal features (different visual characteristics, audio inputs, and text) by a fusion strategy for efficient retrieval. Furthermore, we explore several loss functions in training the joint embedding and propose a modified pairwise ranking loss for the retrieval task. Experiments on MSVD and MSR-VTT datasets demonstrate that our method achieves significant performance gain compared to the state-of-the-art approaches.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!