Type Ia supernovae (SN Ia) are faraway exploding stars used as ``standardizable candles'' to determine cosmological distances, measure the accelerating expansion of the Universe, and constrain the properties of dark energy. Inferring peak luminosities of SN Ia from distance-independent observables, such as the shapes and colors of their light curves (time series), underpins the evidence for cosmic...
We have collected a data set for the social networks of statisticians. The data set consists of the meta information (e.g., authors, abstracts, citation counts) of about 70,000 papers in 36 representative journals in statistics and related fields, from 1984-2015. Our data collection project (which we may call it the Phase II) is a continuation of the recent data collection project by Ji and Jin (which we may call the Phase I)....
Nonparametric and nonlinear measures of statistical dependence between pairs of random variables have proved themselves important tools in modern data analysis, where the emergence of large data sets can support the relaxation of linearity assumptions implicit in traditional association scores such as correlation. In this talk, I will present two Bayesian nonparametric...
We developed a computational approach to study tumor-infiltrating immune cells and their interactions with cancer cells. Analysis of over ten thousand RNA-seq samples from the Cancer Genome Atlas (TCGA) identified strong association between immune infiltrates and patient clinical features, viral infection status, and cancer genetic alterations. We found that melanomas with high...
Recent Advances in Post-Selection Statistical Inference
We describe the problem of “post-selection inference.” This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have “cherry-picked”—searched for the strongest associations—means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large...