Authors: Chen Wei,Lingxi Xie,Xutong Ren,Yingda Xia,Chi Su,Jiaying Liu,Qi Tian,Alan L. Yuille
Where published:
CVPR 2019 6
ArXiv: 1812.00329
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1812.00329v1
Learning visual features from unlabeled image data is an important yet
challenging task, which is often achieved by training a model on some
annotation-free information. We consider spatial contexts, for which we solve
so-called jigsaw puzzles, i.e., each image is cut into grids and then
disordered, and the goal is to recover the correct configuration. Existing
approaches formulated it as a classification task by defining a fixed mapping
from a small subset of configurations to a class set, but these approaches
ignore the underlying relationship between different configurations and also
limit their application to more complex scenarios. This paper presents a novel
approach which applies to jigsaw puzzles with an arbitrary grid size and
dimensionality. We provide a fundamental and generalized principle, that weaker
cues are easier to be learned in an unsupervised manner and also transfer
better. In the context of puzzle recognition, we use an iterative manner which,
instead of solving the puzzle all at once, adjusts the order of the patches in
each step until convergence. In each step, we combine both unary and binary
features on each patch into a cost function judging the correctness of the
current configuration. Our approach, by taking similarity between puzzles into
consideration, enjoys a more reasonable way of learning visual knowledge. We
verify the effectiveness of our approach in two aspects. First, it is able to
solve arbitrarily complex puzzles, including high-dimensional puzzles, that
prior methods are difficult to handle. Second, it serves as a reliable way of
network initialization, which leads to better transfer performance in a few
visual recognition tasks including image classification, object detection, and
semantic segmentation.