We are very excited to join forces with MLCommons and OctoML.ai! Contact Grigori Fursin for more details!

A Riemannian Framework for Statistical Analysis of Topological Persistence Diagrams

lib:41a38a819f9ef9d5 (v1.0.0)

Vote to reproduce this paper and share portable workflows   1 
Authors: Rushil Anirudh,Vinay Venkataraman,Karthikeyan Natesan Ramamurthy,Pavan Turaga
ArXiv: 1605.08912
Document:  PDF  DOI 
Artifact development version: GitHub
Abstract URL: http://arxiv.org/abs/1605.08912v1


Topological data analysis is becoming a popular way to study high dimensional feature spaces without any contextual clues or assumptions. This paper concerns itself with one popular topological feature, which is the number of $d-$dimensional holes in the dataset, also known as the Betti$-d$ number. The persistence of the Betti numbers over various scales is encoded into a persistence diagram (PD), which indicates the birth and death times of these holes as scale varies. A common way to compare PDs is by a point-to-point matching, which is given by the $n$-Wasserstein metric. However, a big drawback of this approach is the need to solve correspondence between points before computing the distance; for $n$ points, the complexity grows according to $\mathcal{O}($n$^3)$. Instead, we propose to use an entirely new framework built on Riemannian geometry, that models PDs as 2D probability density functions that are represented in the square-root framework on a Hilbert Sphere. The resulting space is much more intuitive with closed form expressions for common operations. The distance metric is 1) correspondence-free and also 2) independent of the number of points in the dataset. The complexity of computing distance between PDs now grows according to $\mathcal{O}(K^2)$, for a $K \times K$ discretization of $[0,1]^2$. This also enables the use of existing machinery in differential geometry towards statistical analysis of PDs such as computing the mean, geodesics, classification etc. We report competitive results with the Wasserstein metric, at a much lower computational load, indicating the favorable properties of the proposed approach.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!