Detection Thresholds for Distribution-Free Non-Parametric Tests: The Curious Case of Dimension 8
Two of the fundamental problems in non-parametric statistical inference are goodness-of-fit and two-sample testing. These two problems have been extensively studied and several multivariate tests have been proposed over the last thirty years, many of which are based on geometric graphs. These include, among several others, the celebrated Friedman-Rafsky two-sample test based on the minimal spanning tree and the K-nearest neighbor graphs, and the Bickel-Breiman spacings tests for goodness-of-fit. These tests are asymptotically distribution-free, universally consistent, and computationally efficient (both in sample size and in dimension), making them particularly attractive for modern statistical applications.
In this talk, we will derive the detection thresholds and limiting local power of these tests, thus providing a way to compare and justify the performance of these tests in various applications. Several interesting properties emerge, such as a curious phase transition in dimension 8, and a remarkable blessing of dimensionality in detecting scale changes. I will also discuss the emerging theory of multivariate ranks based on optimal transport and how they can be used to construct efficient distribution-free two-sample tests.