Interview with 2024 Dempster Prize Recipient Souhardya Sengupta

Souhardya Sengupta

In May 2024, PhD student Souhardya Sengupta received the 2024 Dempster Award for his exceptional paper titled "Leveraging Sparsity in the Gaussian Linear Model for Improved Inference," co-authored with Associate Professor Lucas Janson. The Dempster Prize is named in honor of Emeritus Professor Arthur P. Dempster and is given annually to a graduate student in recognition of outstanding research. In the following excerpted and edited interview, Sengupta talks with us about his academic journey thus far—from receiving early inspiration from his father to learning how to navigate the rigorous environment at the Indian Statistical Institute to relishing the community and intellectual freedom in his Statistics PhD program at Harvard.

Was there an experience or influence early in your life that sparked your interest in math or stats?

Sengupta: One influence on my interest in math was my father when I was young. When I was in elementary and middle school studying arithmetic and basic algebra, he used to help me with my math homework. I admired his wits as he would quickly solve problems I had struggled with for days—something I thought was really cool! I believe being able to do the same was one of my earliest inspirations, slightly tilting my preference toward math over other subjects. My interest continued throughout high school and led me to pursue it in college.

What motivated your choice of undergraduate institution and your decision to pursue a PhD program?

Sengupta: In short, ISI [Indian Statistical Institute] was the most prestigious place I got into! Initially, I wanted to study math; I thought that I would pursue ISI’s Bachelor of Mathematics program in another city. However, a few of my teachers convinced me to apply for the Bachelor of Statistics program instead at ISI Kolkata, which was also only about 45 minutes away from my home. So my decision to pursue statistics was not particularly well thought out and, by some happy coincidence, I ended up liking this subject.

While I was undecided about graduate school during my first two years at ISI, my decision to apply solidified during the COVID-19 lockdown. I had just taken a course on nonparametric statistics that I really liked. When I was in lockdown, I did some independent reading on some of the topics I liked, and I think that entire period built my appreciation for research that later on developed through my remaining years at ISI. I appreciated the clever and highly creative ways researchers have approached problems in the literature. An example of something I really liked was the literature on multivariate concepts of ranks and ordering, where people have tried to extend the idea of ranking scalar data points to ranking multivariate data or tuples—that is, a collection of numbers rather than single values. Later on, I worked on some research projects, which convinced me that I wanted to continue doing this, and hence, pursuing a PhD was an obvious choice.

What challenges have you experienced during your studies and what did you learn from these difficulties?

Sengupta: In my undergraduate and graduate experience thus far, I have been lucky that I haven’t had a major setback that completely threw me off track, but I have had my share of ups and downs and some tough personal times. Also, like most of us, I struggled with the disruptions, isolation, and the mental toll of the COVID lockdown.

In my academics, especially when I first joined ISI, I found much of my coursework to be challenging and quite a few times felt overwhelmed with it. Over time, I learned two major lessons that I have found echoed in various aspects of my life and not just academics. First, I learned that most of the things that seem impossible today might not seem that difficult eventually. Second, I learned not to overemphasize certain negative events. Early on, I treated any failure like a disaster, but over time, I have learned to treat setbacks as an inevitable part of the process, perhaps just like successes.

What memories do you have of getting to know the Stats Department during your first year?

Sengupta: Everything was new to me when I first arrived in the department, so my early experiences stand out. The wide range of social events, especially during my first year, made me feel welcome in the department. Often on Friday nights, we would gather in the lounge to play board games. Later in the year, a group of us started playing badminton at the Harvard MAC gym. Another great memory is our annual departmental PhD retreat, which features an outside speaker, faculty lightning talks, and a social. During my first year, the social included ice skating, which I tried for the first time; it didn’t go well, but it was fun and I’m glad I tried it!

How did your collaboration with Professor Lucas Janson begin? What were some of your key findings in your paper “Leveraging Sparsity in the Gaussian Linear Model for Improved Inference”?

Sengupta: When I started my PhD program at Harvard, I was familiar with Professor Janson’s research area and felt that my research interests aligned, so I just walked into his office and asked if I could work with him (and he agreed!).

Our paper revolves around the linear regression model, one of the most widely used statistical models to answer basic scientific questions. For example, suppose you want to study how age affects someone’s risk of diabetes. You might want to study other factors too, such as blood biomarkers. The goal would be to quantify how each of these factors, also known as covariates, relates to a response variable (in this case, diabetes). Linear regression is a simple model that helps you study these relationships. In this scenario, if you want to see whether a factor like age has an effect on diabetes that cannot be explained by the other covariates, you typically use the standard t-test.

Our contribution is an exact alternative to the t-test called the "ℓ-test.” You can use the ℓ-test in any scenario in which you would apply the t-test, and often it has higher power (that is the probability that the test would correctly detect a true relationship) than the t-test. Our most surprising finding was that our ℓ-test achieves power close to that of the one-sided t-test, which apriori knows whether the relation between the covariate and the response is increasing or decreasing. To illustrate, if we had prior knowledge that any significant relationship between age and diabetes could only be positive (i.e., increased age leads to increased risk), we would use a one-sided t-test to assess whether an increase in age corresponds with an increase in the likelihood of developing diabetes, unlike the usual t-test that tests whether an increase in age corresponds to any change in the likelihood of developing diabetes. It turns out that without any access to this prior knowledge, our ℓ-test still approximates the performance of a one-sided t-test, often very closely in the settings when there are a large number of covariates, with most of them having no effect on the response.  (This is called sparsity, and such a setting occurs frequently in many scientific studies, for example in genetics.)

What aspects of your PhD program have you valued the most? Can you think of a word to encapsulate this experience?

Sengupta: If I had to choose one word to encapsulate my PhD experience, it would be: excitement. There’s a lot of frustration, too, but it’s all built around the excitement that something might finally work out. To be more specific, it’s the multifaceted nature of my program as well as the novelty and intellectual freedom that have made this an exciting experience for me. I enjoy taking on different roles: sometimes I’m a student attending lectures and taking exams, other times I’m a teaching assistant running sections and grading homework, and then I’m a researcher, working through ideas and trying to solve open-ended problems.

I also value the novelty and intellectual freedom that comes with PhD research. You never know where an idea will lead, and each day holds a bit of unpredictability. Most attempts at a solution are usually futile but when something clicks and finally works, it’s incredibly rewarding. That joy of discovery is unmatched!

What are you excited to pursue this fall (both personally and academically)?

Sengupta: In my academics, I have ongoing research problems that I’d like to make progress on during the fall—I’m curious to see what comes out of it! I also have a personal goal: I want to get back into playing the guitar. I started learning it in 2022, just before I moved to the U.S. Since starting grad school, I’ve just been too busy to pick it up again, but now that I have bought a guitar, I’m going to really try to practice!

Works Cited:

Sengupta, S., & Janson, L. (2024). Leveraging Sparsity in the Gaussian Linear Model for Improved Inference. arXiv preprint arXiv:2406.18390.