This paper studies image-based geo-localization (IBL) problem using
ground-to-aerial cross-view matching. The goal is to predict the spatial
location of a ground-level query image by matching it to a large geotagged
aerial image database (e.g., satellite imagery). This is a challenging task due
to the drastic differences in their viewpoints and visual appearances. Existing
deep learning methods for this problem have been focused on maximizing feature
similarity between spatially close-by image pairs, while minimizing other
images pairs which are far apart. They do so by deep feature embedding based on
visual appearance in those ground-and-aerial images. However, in everyday life,
humans commonly use {\em orientation} information as an important cue for the
task of spatial localization. Inspired by this insight, this paper proposes a
novel method which endows deep neural networks with the `commonsense' of
orientation. Given a ground-level spherical panoramic image as query input (and
a large georeferenced satellite image database), we design a Siamese network
which explicitly encodes the orientation (i.e., spherical directions) of each
pixel of the images. Our method significantly boosts the discriminative power
of the learned deep features, leading to a much higher recall and precision
outperforming all previous methods. Our network is also more compact using only
1/5th number of parameters than a previously best-performing network. To
evaluate the generalization of our method, we also created a large-scale
cross-view localization benchmark containing 100K geotagged ground-aerial pairs
covering a city. Our codes and datasets are available at
\url{https://github.com/Liumouliu/OriCNN}.