CMSA Talk: Noureddine El Karoui

Date: 

Thursday, February 26, 2015, 12:50pm to 1:50pm

Location: 

Science Center Rm. 232
Title: On high-dimensional robust regression and inefficiency of maximum likelihood methods Abstract: I will discuss the behavior of widely used statistical methods in the high-dimensional setting where the number of observations, n, and the number of predictors, p, are both large. I will present limit theorems about the behavior of the corresponding estimators, their asymptotic risks etc... The results apply not only to robust regression estimators, but also Lasso-type estimators and many much more complicated problems. Some of the results answer a question raised by Huber in his seminal '73 paper on robust regression. Many surprising statistical phenomena occur: for instance, maximum likelihood methods are shown to be (grossly) inefficient, and loss functions that should be used in regression are shown to depend on the ratio p/n. This means that dimensionality should be explicitly taken into account when performing simple tasks such as regression. More generally, we'll see that intuition based on results obtained in the small p, large n setting leads to misconceptions and the use of suboptimal procedures. It also turns out that inference is possible in this setting. We'll also see that the geometry of the design matrix plays a key role in these problems and use this fact to disprove claims of universality of some of the results. Mathematically, the tools needed mainly come from random matrix theory, measure concentration and convex analysis. Based on several papers, including some which are joint work with Derek Bean, Peter Bickel, Chingwhay Lim and Bin Yu. (Special seminar: flyer)