Biostats: Edoardo Airoldi

Date: 

Tuesday, March 10, 2015, 12:30pm to 2:00pm

Location: 

Building 2, Room 426 - Biostatistics Conference Room - HSPH
Valid statistical analyses and reproducible science in the era of high-throughput biology High-throughput technology (eg, sequencing, mass spec) allows us to quantify biological mechanisms at a resolution that array technology and small scale experiments cannot. In the next 5-10 years, a substantial portion of biological research is expected to leverage some of these technologies. This flexibility comes with a price, however. Modern high-throughput instrumentation relies on built-in data collection protocols that are often biased. For instance, a mass spec selects the most abundant ions, at an early stage of the measurement process, for further analysis. The major unexpected consequence of such protocols is that they carry information about those quantities we are interested in estimating, e.g., absolute protein abundance, in the mass spec example. Scientists that do not account for this information in the analysis, whether by counting or estimation using a statistical model, will likely base their scientific conclusions on misleading numbers, even in simple experimental conditions. This statistical issue is poorly understood by practitioners and amateur statisticians alike. It is arguably the main challenge we need to tackle to produce valid scientific conclusions in the era of high-throughput technology. I'll provide two illustrations in mass spectrometry and genomics. Program in Quantitative Genomics (PQG)