Authors: AJ Piergiovanni,Anelia Angelova,Michael S. Ryoo
ArXiv: 1910.06961
Document:
PDF
DOI
Abstract URL: https://arxiv.org/abs/1910.06961v1
Video understanding is a challenging problem with great impact on the abilities of autonomous agents working in the real-world. Yet, solutions so far have been computationally intensive, with the fastest algorithms running for more than half a second per video snippet on powerful GPUs. We propose a novel idea on video architecture learning - Tiny Video Networks - which automatically designs highly efficient models for video understanding. The tiny video models run with competitive performance for as low as 37 milliseconds per video on a CPU and 10 milliseconds on a standard GPU.