Authors: Tianchen Wang,Jinjun Xiong,Xiaowei Xu,Yiyu Shi
ArXiv: 1903.07663
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1903.07663v1
Various convolutional neural networks (CNNs) were developed recently that
achieved accuracy comparable with that of human beings in computer vision tasks
such as image recognition, object detection and tracking, etc. Most of these
networks, however, process one single frame of image at a time, and may not
fully utilize the temporal and contextual correlation typically present in
multiple channels of the same image or adjacent frames from a video, thus
limiting the achievable throughput. This limitation stems from the fact that
existing CNNs operate on deterministic numbers. In this paper, we propose a
novel statistical convolutional neural network (SCNN), which extends existing
CNN architectures but operates directly on correlated distributions rather than
deterministic numbers. By introducing a parameterized canonical model to model
correlated data and defining corresponding operations as required for CNN
training and inference, we show that SCNN can process multiple frames of
correlated images effectively, hence achieving significant speedup over
existing CNN models. We use a CNN based video object detection as an example to
illustrate the usefulness of the proposed SCNN as a general network model.
Experimental results show that even a non-optimized implementation of SCNN can
still achieve 178% speedup over existing CNNs with slight accuracy degradation.