# Sambit Panda

I’m a PhD candidate at Johns Hopkins, where I am advised by Joshua T. Vogelstein in the NeuroData lab. Most days, I develop and apply high-dimensional and nonlinear machine learning algorithms to answer interesting biomedical questions. Here’s a little bit more about me.

## Research

Here are some of my favorite research articles. If you want to read more, take a look at the full publication list.

### 📝 High-dimensional and universally consistent k-sample tests

Introduces the idea that the k-sample testing problem and independence testing problem are equivalent up to a transformation of the data.

### 📝 Learning Interpretable Characteristic Kernels via Decision Forests

Demonstrates the kernel derived from random forest is characteristic and develops a hypothesis test based on that fact (KMERF).

### 📄 The Chi-Square Test of Distance Correlation

Derives an approximation to the p-value of distance correlation that bypasses the permutation test with no significant loss of power.

## Software

I love solving difficult problems, and often times develop software to help. You can see more in the full software list.

### treeple

Extends scikit-learn decision trees to do oblique splits, manifold learning, hypothesis testing, etc. I am a core contributor and maintainer of this project.

### hyppo

hyppo (HYPothesis Testing in PythOn, pronounced ‘Hippo’) is an open-source software package for multivariate hypothesis testing, closing the gap with R. I also wrote a paper about it. I am the creator and maintainer of this project.

### scipy.stats.multiscale_graphcorr

Multiscale Graph Correlation is a powerful multivariate test (the first multivariate test in SciPy). I ported this code and am a maintainer of this method.