Authors: Baoyuan Wang,Noranart Vesdapunt,Utkarsh Sinha,Lei Zhang
ArXiv: 1803.07212
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1803.07212v1
We present an automatic moment capture system that runs in real-time on
mobile cameras. The system is designed to run in the viewfinder mode and
capture a burst sequence of frames before and after the shutter is pressed. For
each frame, the system predicts in real-time a "goodness" score, based on which
the best moment in the burst can be selected immediately after the shutter is
released, without any user interference. To solve the problem, we develop a
highly efficient deep neural network ranking model, which implicitly learns a
"latent relative attribute" space to capture subtle visual differences within a
sequence of burst images. Then the overall goodness is computed as a linear
aggregation of the goodnesses of all the latent attributes. The latent relative
attributes and the aggregation function can be seamlessly integrated in one
fully convolutional network and trained in an end-to-end fashion. To obtain a
compact model which can run on mobile devices in real-time, we have explored
and evaluated a wide range of network design choices, taking into account the
constraints of model size, computational cost, and accuracy. Extensive studies
show that the best frame predicted by our model hit users' top-1 (out of 11 on
average) choice for $64.1\%$ cases and top-3 choices for $86.2\%$ cases.
Moreover, the model(only 0.47M Bytes) can run in real time on mobile devices,
e.g. only 13ms on iPhone 7 for one frame prediction.