Authors: Rushil Anirudh,Vinay Venkataraman,Karthikeyan Natesan Ramamurthy,Pavan Turaga
ArXiv: 1605.08912
Document:
PDF
DOI
Artifact development version:
GitHub
Abstract URL: http://arxiv.org/abs/1605.08912v1
Topological data analysis is becoming a popular way to study high dimensional
feature spaces without any contextual clues or assumptions. This paper concerns
itself with one popular topological feature, which is the number of
$d-$dimensional holes in the dataset, also known as the Betti$-d$ number. The
persistence of the Betti numbers over various scales is encoded into a
persistence diagram (PD), which indicates the birth and death times of these
holes as scale varies. A common way to compare PDs is by a point-to-point
matching, which is given by the $n$-Wasserstein metric. However, a big drawback
of this approach is the need to solve correspondence between points before
computing the distance; for $n$ points, the complexity grows according to
$\mathcal{O}($n$^3)$. Instead, we propose to use an entirely new framework
built on Riemannian geometry, that models PDs as 2D probability density
functions that are represented in the square-root framework on a Hilbert
Sphere. The resulting space is much more intuitive with closed form expressions
for common operations. The distance metric is 1) correspondence-free and also
2) independent of the number of points in the dataset. The complexity of
computing distance between PDs now grows according to $\mathcal{O}(K^2)$, for a
$K \times K$ discretization of $[0,1]^2$. This also enables the use of existing
machinery in differential geometry towards statistical analysis of PDs such as
computing the mean, geodesics, classification etc. We report competitive
results with the Wasserstein metric, at a much lower computational load,
indicating the favorable properties of the proposed approach.