Authors: Guangrun Wang,Liang Lin,Shengyong Ding,Ya Li,Qing Wang
ArXiv: 1604.04377
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1604.04377v1
The past decade has witnessed the rapid development of feature representation
learning and distance metric learning, whereas the two steps are often
discussed separately. To explore their interaction, this work proposes an
end-to-end learning framework called DARI, i.e. Distance metric And
Representation Integration, and validates the effectiveness of DARI in the
challenging task of person verification. Given the training images annotated
with the labels, we first produce a large number of triplet units, and each one
contains three images, i.e. one person and the matched/mismatch references. For
each triplet unit, the distance disparity between the matched pair and the
mismatched pair tends to be maximized. We solve this objective by building a
deep architecture of convolutional neural networks. In particular, the
Mahalanobis distance matrix is naturally factorized as one top fully-connected
layer that is seamlessly integrated with other bottom layers representing the
image feature. The image feature and the distance metric can be thus
simultaneously optimized via the one-shot backward propagation. On several
public datasets, DARI shows very promising performance on re-identifying
individuals cross cameras against various challenges, and outperforms other
state-of-the-art approaches.