Learning Sources of Variability from High-Dimensional Observational Studies

by Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, and Joshua T. Vogelstein
in arXiv on July, 2023


Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the “average treatment effect,” this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.


  title = {Learning Sources of Variability from High-Dimensional Observational Studies},
  author = {Bridgeford, Eric W. and Chung, Jaewon and Gilbert, Brian and Panda, Sambit and Li, Adam and Shen, Cencheng and Badea, Alexandra and Caffo, Brian and Vogelstein, Joshua T.},
  year = {2023},
  month = jul,
  number = {arXiv:2307.13868},
  eprint = {2307.13868},
  primaryclass = {cs, stat},
  publisher = {{arXiv}},
  doi = {10.48550/arXiv.2307.13868},
  archiveprefix = {arxiv},
  copyright = {All rights reserved}