Stability-driven deep model interpretation and provably fast MCMC sampling
Data science is transforming many traditional ways in which we approach scientific problems. While the abundance of data and algorithms generate a lot of excitement in statistical modeling, serious concerns about how to reliably and efficiently extract scientific knowledge from data and models are being raised.
In this talk, I will address particular reliability and efficiency issues that arise from my PhD study on a neuroscience project. Understanding how primates process...
Title: A modern maximum-likelihood approach for high-dimensional logistic regression
Abstract: Logistic regression is arguably the most widely used and studied non-linear model in statistics. Classical maximum-likelihood theory based statistical inference is ubiquitous in this context. This theory hinges on well-known fundamental results: (1) the maximum-likelihood-estimate (MLE) is asymptotically unbiased and normally distributed, (2) its variability can be quantified via the inverse Fisher information, and (3) the...
Spectral Methods and Nonconvex Optimization: A Modern Statistical Perspective
Modern statistical analysis often requires integration of statistical thinking and algorithmic thinking. In many problems, statistically sound estimation procedures (e.g., the MLE) may be difficult to compute, at least in the naive form. This challenge calls for a new look into simple statistical methods such as the spectral methods (including PCA), as well as an examination of optimization algorithms from the statistical lens.
Non-iterative Estimation Update for Parametric and Semiparametric Models with Population-based Auxiliary Information
With the advancement in disease registries and surveillance data, population-based information on disease incidence, survival probability or other important biological characteristics become increasingly available. Such information can be leveraged in studies that collect detailed measurements but with smaller sample sizes. In contrast to recent proposals that formulate the...
Mendelian randomization: A comprehensive statistical approach and applications to preventing heart disease
Mendelian randomization (MR) can give unbiased estimate of a confounded causal effect by using genetic variants as instrumental variables. The summary-data MR design is rapidly gaining popularity in practice due to the increasing availability of large-scale genome-wide association studies. As we are entering the "MR of every risk factor on every disease outcome" era, existing statistical methods still have several major limitations and lack theoretical...
Large-scale Optimal Transport: Statistics and Computation
Optimal transport is a concept from probability which has recently seen an explosion of interest in machine learning and statistics as a tool for analyzing high-dimensional data. However, the key obstacle in using optimal transport in practice has been its high statistical and computational cost. In this talk, we show how exploiting different notions of structure can lead to better statistical rates—beating the curse of dimensionality—and state-of-the-art algorithms.
Network data arises frequently in modern scientific applications. These networks often exhibit specific characteristics like edge sparsity, heavy-tailed degree distribution etc. Some broad challenges arising in the analysis of such datasets include (i) developing flexible, interpretable models for networks, (ii) provably recovering latent structure from such data, and (iii) testing for goodness of fit.