Open library

Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

lib:8e0154136b62ab9e (v1.0.0)

Authors: Peratham Wiriyathammabhum,Abhinav Shrivastava,Vlad I. Morariu,Larry S. Davis
Where published: WS 2019 6
ArXiv: 1904.03885
Document: PDF DOI

Abstract URL: http://arxiv.org/abs/1904.03885v1

This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface realization to enable us to investigate the problem of grounding spatio-temporal identifying descriptions in videos. We then propose a two-stream modular attention network that learns and grounds spatio-temporal identifying descriptions based on appearance and motion. We show that motion modules help to ground motion-related words and also help to learn in appearance modules because modular neural networks resolve task interference between modules. Finally, we propose a future challenge and a need for a robust system arising from replacing ground truth visual annotations with automatic video object detector and temporal event localization.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments