in arXiv on April, 2021
Abstract
The (k)-sample testing problem tests whether or not (k)groups of data points are sampled from the same distribution. Multivariate analysis of variance (MANOVA) is currently the gold standard for (k)-sample testing but makes strong, often inappropriate, parametric assumptions. Moreover, independence testing and (k)-sample testing are tightly related, and there are many nonparametric multivariate independence tests with strong theoretical and empirical properties, including distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic). We prove that universally consistent independence tests achieve universally consistent (k)-sample testing and that (k)-sample statistics like Energy and Maximum Mean Discrepancy (MMD) are exactly equivalent to Dcorr. Empirically evaluating these tests for (k)-sample scenarios demonstrates that these nonparametric independence tests typically outperform MANOVA, even for Gaussian distributed settings. Finally, we extend these non-parametric (k)-sample testing procedures to perform multiway and multilevel tests. Thus, we illustrate the existence of many theoretically motivated and empirically performant (k)-sample tests. A Python package with all independence and k-sample tests called hyppo is available from https://hyppo.neurodata.io/.
Citation
@misc{panda2021nonpar, title = {Nonpar {{MANOVA}} via {{Independence Testing}}}, author = {Panda, Sambit and Shen, Cencheng and Perry, Ronan and Zorn, Jelle and Lutz, Antoine and Priebe, Carey E. and Vogelstein, Joshua T.}, year = {2021}, month = apr, number = {arXiv:1910.08883}, eprint = {1910.08883}, primaryclass = {cs, stat}, publisher = {{arXiv}}, doi = {10.48550/arXiv.1910.08883}, archiveprefix = {arxiv}, copyright = {All rights reserved} }