How can one combine a collection of estimators of a regression function into a good aggregate? In the last 15 years, this age-old question has received increasing attention within the Mathematical Statistics community. A closely related question of regression in misspecified models has been studied within Statistical Learning using the techniques of empirical processes. We outline the...
Factor analysis is a popular tool for identifying and summarizing associations among multiple measures. When measures on organizations, areas, or similar higher-level units are obtained by summarizing data from groups of individuals, associations at the group level are often of primary interest while those at the individual level might not even be meaningfully defined. These data...
Statisticians are increasingly posed seemingly paradoxical questions, challenging our qualifications for entering the statistical paradises created by Big Data. Two such questions represent the use of Big Data for population inferences and individualized predictions: (1) “Which one should I trust: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the...
Many applications, such as photon-limited imaging and genomics, involve large datasets with entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient for non-Gaussian noise. In this talk we present ePCA, a methodology for PCA on exponential family distributions. ePCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a...