Authors: Jing Zhang,Yuchao Dai,Fatih Porikli,Mingyi He
ArXiv: 1708.04366
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1708.04366v1
There has been profound progress in visual saliency thanks to the deep
learning architectures, however, there still exist three major challenges that
hinder the detection performance for scenes with complex compositions, multiple
salient objects, and salient objects of diverse scales. In particular, output
maps of the existing methods remain low in spatial resolution causing blurred
edges due to the stride and pooling operations, networks often neglect
descriptive statistical and handcrafted priors that have potential to
complement saliency detection results, and deep features at different layers
stay mainly desolate waiting to be effectively fused to handle multi-scale
salient objects. In this paper, we tackle these issues by a new fully
convolutional neural network that jointly learns salient edges and saliency
labels in an end-to-end fashion. Our framework first employs convolutional
layers that reformulate the detection task as a dense labeling problem, then
integrates handcrafted saliency features in a hierarchical manner into lower
and higher levels of the deep network to leverage available information for
multi-scale response, and finally refines the saliency map through dilated
convolutions by imposing context. In this way, the salient edge priors are
efficiently incorporated and the output resolution is significantly improved
while keeping the memory requirements low, leading to cleaner and sharper
object boundaries. Extensive experimental analyses on ten benchmarks
demonstrate that our framework achieves consistently superior performance and
attains robustness for complex scenes in comparison to the very recent
state-of-the-art approaches.