As part of World Statistics Day festivities, the Chicago Chapter of the ASA named Xiao-Li Meng Statistician of the Year 2015-2016 and invited him to speak at a special dinner in his honor. His lecture, entitled "Statistical Paradises and Paradoxes of Big Data", is best described by the following abstract.
Statisticians are increasingly posed with thought-provoking and often paradoxical questions, challenging our qualifications for entering the statistical paradises created by Big Data. Questions addressed in this article include
- Which one should I trust: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population?
- With all these big data, is sampling or randomization still relevant?
- Personalized treatments -- that sounds heavenly, but where on earth did they find the right guinea pig for me?
The proper responses are respectively
- "It depends!", because we need data-quality indexes, not merely quantitative sizes, to determine;
- "Absolutely!", and indeed Big Data has inspired methods such as counterbalancing sampling to combat inherent selection bias in big data; and
- "They didn't!", but the question has led to a multi-resolution framework for studying statistical evidence for predicting individual outcomes.
All proposals highlight the need, as we get deeper into this era of Big Data, to reaffirm some time-honored statistical themes (e.g., bias-variance trade-off), and to remodel some others (e.g., approximating individuals from proxy populations verses inferring populations from samples).