Authors: David Novotny,Diane Larlus,Andrea Vedaldi
Where published:
ICCV 2017 10
ArXiv: 1705.03951
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1705.03951v2
Traditional approaches for learning 3D object categories use either synthetic
data or manual supervision. In this paper, we propose a method which does not
require manual annotations and is instead cued by observing objects from a
moving vantage point. Our system builds on two innovations: a Siamese viewpoint
factorization network that robustly aligns different videos together without
explicitly comparing 3D shapes; and a 3D shape completion network that can
extract the full shape of an object from partial observations. We also
demonstrate the benefits of configuring networks to perform probabilistic
predictions as well as of geometry-aware data augmentation schemes. We obtain
state-of-the-art results on publicly-available benchmarks.