Interview with William Nickols, May 2024 Master’s Prize Winner

William Nickols

Congratulations to William Nickols, the May 2024 recipient (along with Kevin Luo) of the Department of Statistics Concurrent Master’s Prize! The prize is awarded annually to up to two graduating students from the Concurrent Master’s program in Statistics who have the best overall performance (as indicated by coursework results), have demonstrated achievements in Statistics outside of coursework, and have contributed significantly to the department.

To learn more about Will's inspiration, thesis experience, and sense of community within the department, we spoke with him. Highlights from our conversation are edited and excerpted below.

What sparked your initial interest in math or statistics?

Nickols: I remember, in my high school junior year AP statistics class, learning about the German Tank Problem. Based on a true WWII story, the set-up was: suppose you’ve captured a random sample of your enemy’s tanks, each of which have a random ID number from 1 to the total number of tanks. You want to estimate the total number of tanks your enemy has. How do you do it? My group thought about it for a while and then decided we should double the average of the observed tank IDs. Then, the class had a competition, drawing numbered slips of paper mimicking tank IDs, and our estimate ended up being just two away from the true number! Two years later in Stat 111 (Introduction to Statistical Inference at Harvard), I learned our approach had a name: the Method of Moments. I found that I enjoyed taking an intuition and turning it into a mathematical framework to address real questions—that’s what really sparked my interest.

What are your initial memories of the Stats Department?

Nickols: Starting college during the pandemic, I remember finding community right away in Stat 110 and 111 through the faculty (Joe Blitzstein and Neil Shephard) and other students. In Stat 111, we had a virtual Pset group to get to know each other and work on homework problems together. In the spring, we scheduled a Zoom “classroom to table” conversation with Joe, where we talked about Joe’s educational philosophy, chess, and biographies we had read. We really appreciated that he made time for us, even with all his obligations and hundreds of students to teach! On the other end, right before graduation, Joe hosted a dinner at Henrietta’s Table with former teaching fellows (TFs) from his classes. We talked for about three hours about everything from hidden messages in the Stat 110 Probability book, to Joe’s Youtube cult following, to MLEs (which half the table thought it meant maximum likelihood estimators, and the other half thought it meant machine learning engineers—maybe a reflection on the state of statistics). These moments demonstrated how invested he was in us.

During my sophomore and junior years, there were many opportunities to connect with others in the department in person. When I took Stat 210 (Probability I) in my junior year, I formed a study group with 5-6 close friends, and we regularly got together to work on Psets. The department faculty and admin, along with GUSH (Group for Undergraduates in Statistics at Harvard), really invested time in creating an in-person community after COVID. When I was a teaching fellow for Professors Kevin Rader, James Xenakis, and Neil Shephard too, they often reached out to get coffee or a meal. There was a lot of faculty and peer support that created a real sense of community for me.

What motivated you to pursue the concurrent master’s program?

Nickols: The concurrent master’s program was a natural fit due to my interest in higher-level coursework, particularly in causal inference, an area of research that looks at the effect of treatment on an outcome. A standout course was Stat 286 (Causal Inference), which looked at how to determine the effects of particular interventions using modern statistical methods. For example, we learned methods to deal with panel data—data collected repeatedly over time from individuals. On an exam, we looked at whether per-pupil educational spending changed when teachers’ collective bargaining laws changed, based on data across U.S. states over time.

A similar problem came up in a project for Stat 288 (Deep Statistics: AI and Earth Observations for Sustainable Development), where, inspired by previous work done by student Victoria Li, I looked at abortion rates after the Dobbs decision in June 2022. To answer the question of whether the Dobbs decision affected abortion rates overall in the US, I used similar statistical methods as in Stat 286 with panel data across states from before and after the court decision. This class exposed me to the cutting edge of applied research, especially causal inference applications.

Did you encounter challenges (personal or academic) during your studies? How did you overcome these challenges?

Nickols: A big jump for me was coming into college and taking Stat 110. I still remember the first question of the first homework assignment. The question asked how many choices there are for courses for a degree if a student can elect to take any 7 courses out of 20 with the condition that at least one course must be a stats course and 5 out of the total 20 courses are statistics courses. I thought about the problem for a while and came up with the answer 135,660. But, the next question on the homework was, “Explain intuitively why the answer is not 135,660”! At that point, I figured it was going to be a rough semester. However, over the next few months, I started recognizing patterns and understanding better how to learn in the course—what to focus on from the textbook and lectures and how to apply that to problems.

Being a TF for Stat 139 (Introduction to Linear Models) and Stat 111 during my junior year provided an excellent chance to solidify what I had learned in prior semesters. Each week, I created and solved practice problems in preparation for teaching my section, providing me with a strong understanding of what goes on behind the scenes of teaching—how (and why) you formulate a problem for students and solve it. Teaching Stat 111 was initially more daunting since it focuses on statistical theory, a topic I had initially struggled with as a student, but as a TF, I could see how I had grown as a statistician. After only two years, the topics that had once kept me up late at night were now things I could explain like the back of my hand.

You completed your senior thesis with Professor Curtis Huttenhower at the Harvard T.H. Chan School of Public Health, working on statistical methods for the microbiome. How did you start collaborating with Prof. Huttenhower and how did you select your thesis topic?

Nickols: I started working with Prof. Huttenhower during the summer after my freshman year as part of the Program for Research in Science and Engineering (PRISE). I wanted to apply the skills I had been developing in my statistics and computer science courses to biological questions—a particular interest of mine. Microbiome research involves microbial communities made up of bacteria, fungi, and archaea that exist essentially everywhere including on human surfaces like the skin, mouth, and gut. In fact, the average person has 2-6 pounds of microbes in and on their body! (When I would present my senior thesis to non-microbiome audiences, I would start by saying, “I’m Will... or I should say, I’m mostly Will, because I’m also about 3% microbes.” This usually got groans, but Joe liked the joke so much he said he would’ve studied the microbiome just to use that line.) These microbes have important implications for human health and diseases like inflammatory bowel disease (IBD), Crohn’s, and ulcerative colitis are linked to the gut microbiome.

To study the gut microbiome, researchers extract DNA from stool samples, map it to microbial genome databases to determine the abundance of different species (e.g., 10% species X, 15% species Y), and look at the differences in the gut microbiome between patients with and without IBD. These data have many zeros (species that are either not present or not detected in a sample), and historically researchers have just added a small “pseudo-count” to handle such zeros before running linear regressions.

My thesis improved the traditional approach in a few ways. First, we separated a microbe’s abundance (how much of it there is, if it’s present) from its prevalence (how likely it is to be present at all). This was an important step because we found that, for over 70% of the associations between the microbiome and disease outcomes in our data, the association was with the microbe’s presence vs. absence rather than its abundance if present. Second, my thesis tackled the issue of relative abundances. Because sequencing data tell us a species’ proportion out of the whole community, but not the actual number of cells of that species, if one microbe becomes more abundant, the other species’ relative proportions decrease, even if their absolute counts stay the same. To address this problem, we introduced statistical models to identify associations between health outcomes and absolute counts, either by incorporating additional experimental information or by estimating the total cell count. The tools we developed are already being used to address questions related to ulcerative colitis, bacterial vaginosis, pre-term birth, colorectal cancer, and pet nutrition.

What do you value the most about your experience in the department and at Harvard? If you had to select a word to encapsulate your stats experience, what would it be?

Nickols: I would describe my experience with statistics at Harvard as empowering. By taking courses like Stat 110 and 111, I developed an understanding of the fundamentals that gave me the skills to build new methods and tools. With more sophisticated tools, I felt empowered to answer complex questions and to evaluate the reliability of my answers. For example, after freshman year, I did research involving multiple databases with overlapping identifiers. I wanted to know: if I randomly sample from each database, how many overlaps should I expect purely by chance? Using the foundational tools that I learned in Stat 110 and Stat 111, I was able to work out the theory, write software, and answer my question. When you understand statistical fundamentals like distributions and random variables, you can dig deep into the methods you are using and why they work.

What are you excited to pursue this fall? Describe some of your career, academic, and personal aspirations and plans.

Nickols: I am excited to begin my PhD in Biostatistics at Harvard next year, where I will rotate in a variety of biostatistics and computational biology groups. I am also looking forward to continuing work on genetic methods that can improve our understanding of how infectious diseases are acquired, cleared, and transmitted. For example, if you have longitudinal genetic data from people who acquired malaria, you can determine previously unknowable things like how long a person has been infected with a particular parasite and whether that parasite is resistant to treatment. I’m also interested in bigger picture questions like how the development of artificial intelligence will speed up our ability to analyze data and produce useful and correct results.

After my PhD program, I’m open to different options, including positions in academia; government (e.g. the NIH or CDC); nonprofits; or hospitals—this will be my problem to solve over the next 4-5 years! Regardless, I intend to continue working on challenges related to real data, including developing and applying methods to solve meaningful problems.