Authors: Jonathan Tremblay,Thang To,Stan Birchfield
ArXiv: 1804.06534
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1804.06534v2
We present a new dataset, called Falling Things (FAT), for advancing the
state-of-the-art in object detection and 3D pose estimation in the context of
robotics. By synthetically combining object models and backgrounds of complex
composition and high graphical quality, we are able to generate photorealistic
images with accurate 3D pose annotations for all objects in all images. Our
dataset contains 60k annotated photos of 21 household objects taken from the
YCB dataset. For each image, we provide the 3D poses, per-pixel class
segmentation, and 2D/3D bounding box coordinates for all objects. To facilitate
testing different input modalities, we provide mono and stereo RGB images,
along with registered dense depth images. We describe in detail the generation
process and statistical analysis of the data.