Authors: Van-Thanh Hoang,Kang-Hyun Jo
ArXiv: 1811.07083
Document:
PDF
DOI
Artifact development version:
GitHub
Abstract URL: http://arxiv.org/abs/1811.07083v1
Convolutional neural networks (CNNs) have shown remarkable performance in
various computer vision tasks in recent years. However, the increasing model
size has raised challenges in adopting them in real-time applications as well
as mobile and embedded vision applications. Many works try to build networks as
small as possible while still have acceptable performance. The state-of-the-art
architecture is MobileNets. They use Depthwise Separable Convolution
(DWConvolution) in place of standard Convolution to reduce the size of
networks. This paper describes an improved version of MobileNet, called Pyramid
Mobile Network. Instead of using just a $3\times 3$ kernel size for
DWConvolution like in MobileNet, the proposed network uses a pyramid kernel
size to capture more spatial information. The proposed architecture is
evaluated on two highly competitive object recognition benchmark datasets
(CIFAR-10, CIFAR-100). The experiments demonstrate that the proposed network
achieves better performance compared with MobileNet as well as other
state-of-the-art networks. Additionally, it is more flexible in fine-tuning the
trade-off between accuracy, latency and model size than MobileNets.