Authors: Marika Swanberg,Ira Globus-Harris,Iris Griffith,Anna Ritz,Adam Groce,Andrew Bray
ArXiv: 1903.00534
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1903.00534v1
Hypothesis testing is one of the most common types of data analysis and forms
the backbone of scientific research in many disciplines. Analysis of variance
(ANOVA) in particular is used to detect dependence between a categorical and a
numerical variable. Here we show how one can carry out this hypothesis test
under the restrictions of differential privacy. We show that the $F$-statistic,
the optimal test statistic in the public setting, is no longer optimal in the
private setting, and we develop a new test statistic $F_1$ with much higher
statistical power. We show how to rigorously compute a reference distribution
for the $F_1$ statistic and give an algorithm that outputs accurate $p$-values.
We implement our test and experimentally optimize several parameters. We then
compare our test to the only previous work on private ANOVA testing, using the
same effect size as that work. We see an order of magnitude improvement, with
our test requiring only 7% as much data to detect the effect.