Authors: Seong Joon Oh,Rodrigo Benenson,Mario Fritz,Bernt Schiele
ArXiv: 1710.03224
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1710.03224v2
People nowadays share large parts of their personal lives through social
media. Being able to automatically recognise people in personal photos may
greatly enhance user convenience by easing photo album organisation. For human
identification task, however, traditional focus of computer vision has been
face recognition and pedestrian re-identification. Person recognition in social
media photos sets new challenges for computer vision, including non-cooperative
subjects (e.g. backward viewpoints, unusual poses) and great changes in
appearance. To tackle this problem, we build a simple person recognition
framework that leverages convnet features from multiple image regions (head,
body, etc.). We propose new recognition scenarios that focus on the time and
appearance gap between training and testing samples. We present an in-depth
analysis of the importance of different features according to time and
viewpoint generalisability. In the process, we verify that our simple approach
achieves the state of the art result on the PIPA benchmark, arguably the
largest social media based benchmark for person recognition to date with
diverse poses, viewpoints, social groups, and events.
Compared the conference version of the paper, this paper additionally
presents (1) analysis of a face recogniser (DeepID2+), (2) new method naeil2
that combines the conference version method naeil and DeepID2+ to achieve state
of the art results even compared to post-conference works, (3) discussion of
related work since the conference version, (4) additional analysis including
the head viewpoint-wise breakdown of performance, and (5) results on the
open-world setup.