Professor Xihong Lin is Interviewed for the International Day of Women in Statistics and Data Science

November 3, 2022
xihonglin.jpg

The Statistics Department is excited to share that the inaugural International Day of Women in Statistics and Data Science occurred this October (on October 11th, 2022). Introduced by the Caucus for Women in Statistics, the Portuguese Statistical Society, and the American Statistical Association, the day celebrates the research contributions of women statisticians and data scientists around the world. In honor of this day, over the next several weeks, we will feature an interview with a faculty member or graduate student each week to showcase their latest research and perspectives on recruiting and retaining more women in the field.

To start off this series, we spoke with Professor Xihong Lin of Statistics (at Harvard FAS) and Biostatistics (at Harvard T.H. Chan School of Public Health), whose research focuses on developing and applying scalable statistical and machine learning methods for big genetics and health data. For example, her research helps identify potential drug targets and precision intervention and treatment strategies. A common thread throughout our conversation with Dr. Lin was her enthusiasm for using statistical and machine learning methods to solve pressing real-world health problems that impact people’s everyday lives. In the following adapted interview excerpts, Dr. Lin provides details about her work, inspiration, and tips for junior researchers and faculty in the field.

1. What inspired you to go into this field and what motivates you in your everyday work? When you look back on your path as a researcher, was there an experience and/or mentor that stood out?

Lin: In undergrad, I was an applied math major and worked with a thesis advisor who focused on forecasting using time series data. For the first time, I saw how my mathematical skills could be applied to real-world problems. I decided to pursue graduate study in statistics. Now, why biostatistics? I was inspired by my grandparents, who were faculty members in preventative medicine in the US. They dedicated their careers to studying schistosomiasis, a parasitic disease that affected many areas of the world, especially before 1960, e.g., they developed a vaccine against the disease. Through them, I was fascinated by problems in public health. I then took a biostatistics course, and thought that biostatistics would be a good way to use my math and statistical background to help people to have healthier, better lives.

While many senior researchers shaped my career, there is one person I’d like to mention: my dissertation advisor, Professor Norman Breslow, the former Chair of the Biostatistics Department at the University of Washington [Professor Breslow passed away in 2015]. Norm was an influential figure in the fields of biostatistics and epidemiology and has been a great role model throughout my career. I learned a lot from Norm, not only through his remarkable statistical sense, high scientific standards, and commitment to solving pressing health problems using statistics, but also through his dedication to building a strong scientific community. In the 1970s, he was the leading statistician of the National Wilms Tumor Study, which helped improve the two-year survival rate of pediatric kidney cancer from 80% to 100%. Inspired by Norm, I enjoy statistical methodological research motivated by real world problems.

2. Describe a recent project that has been exciting to work on. Why is this an important project?

Lin: There are two examples of rewarding projects that I’d like to share – a research project and an educational project. The research that I do is supported by three main pillars: statistics, computer science and informatics, and genetics and health science (the domain science). These pillars all play a critical role in data science, especially for solving big problems and making the computational and statistical methods useful in practice. My lab consists of students and postdocs in statistics and biostatistics, computational biologists, computer scientists, and software engineers. To effectively collaborate with the interdisciplinary research community I work with, I need to have a good understanding of genetics and health so that the methods we develop address important questions in the field.

Specifically, for this research project, my lab has been developing and applying methods to analysis of large-scale whole genome sequencing studies in two national NIH consortia: the Trans-Omics for Precision Medicine Program (TOPMed) of the National Heart, Lung and Blood Institute and the Genome Sequencing Program (GSP) of the National Human Genome Research Institute [both NIH programs focus on studying gene variation that leads to diseases], as well as the UK biobank. Between TOPMed and GSP, over half a million people have had their genomes sequenced and about a billion of genetic variants across the genome were identified. The data are massive (in the size of hundreds of terabypes). For the UK biobank, about half a million people have genome sequencing data together with information on thousands of diseases and traits from their electronic health records. Our goal is to develop scalable, efficient, and interpretable [the ability to interpret the results] statistical and machine learning methods that can identify the genetic variants that cause human diseases. The findings can help identify potential targets for drug development and develop precision health strategies. The development of such methods is challenging but rewarding.

My second project has been teaching the course STAT 364 Scalable Statistical Inference with Applications to Big Data. In this course, statistics and biostatistics faculty present their current work, and graduate students lead the discussion of the faculty presenters’ papers. I very much enjoy this class because it allows the participants and me to learn about the faculty’s latest research and brainstorm new ideas and opportunities for collaboration. The course has served as the starting point for collaborating with multiple junior faculty in the department on papers or on supervising graduate students. For example, when I gave a talk on causal mediation analysis inference for genome-wide data, Dr. Lucas Janson and his student (I’m also on the student’s dissertation committee) came up with a different and innovative approach to the problem I was presenting. Thanks to this course, there are many opportunities to inspire each other’s research. A main attraction in working in academia is to work with students and postdocs, who often inspire me. Seeing them grow and succeed in their careers is very rewarding.

3. What steps do you think are effective for recruiting and/or retaining women in your field?

Lin: Increasing the recruitment and retention of women in statistics and STEM-related fields is critically important to the advancement of research in these fields. Diversity makes better science. While there are multiple strategies to recruit and retain women, it’s crucial for institutions and departments to build a supportive, respectful, and collegial environment. To support the advancement of junior women faculty, it is important for senior faculty, both male and female, to serve as role models, be generous, and create opportunities for the junior faculty to thrive. For example, senior faculty can help junior faculty identify collaborative opportunities, develop confidence and independence, and build professional networks. In addition, senior faculty can help increase junior faculty’s visibility and recognition, as well as give them credit and show appreciation for their contributions. Of course, listening and addressing junior faculty’s concerns and understanding their perspectives often go a long way. Many of us have greatly benefited from the generous help and support provided by senior statisticians and biostatisticians when we started our careers. It is important to continue this positive culture in our field, which will in turn make the statistics community stronger.

4. What advice would you give to young women (in grade school, college, and beyond) who are interested in pursuing research in your field?

Lin: While there are different strategies for succeeding in graduate school and beyond, passion is important and can carry you a long way. Stay open-minded and be curious. Research is a process, so it’s important to enjoy the process; while research takes time and includes disappointments and hurdles, it is also deeply fulfilling. We all have encountered various rejections, hard times, and failures in our careers, e.g., papers being rejected and grants not being renewed. You are not alone. Be nice to yourself, learn to let it go, and move on. Keep up with the good spirit and stay positive and forward-looking.

In addition, it’s important to have confidence in yourself; no one is perfect, so it’s helpful to identify and play to your own strengths when you try to position yourself and pursue a research niche. As a junior researcher, it is important to develop independence and build a reputation in a specific area, i. e., when people talk about an area, people will know your name. Beyond your core quantitative skills, soft skills are crucial for your success as a researcher and educator. Some examples of useful soft skills are communication, presentation, and writing skills, including how to communicate effectively with statisticians and non-statisticians. In addition, develop the ability to take initiative and risks, such as starting a working group on a new research area in your department, or exploring research directions in an emerging, mostly unexplored, area that has great promise. It is easier to build a reputation by jumping into a new, understudied (but significant) area early on. Relatedly, it useful to develop a good sense of what problems are likely to be important in your field. Lastly, a key skill is to stay focused. While it may be tempting for junior researchers to work on many interesting projects at once, there is a risk of spreading yourself too thin. It is essential to develop priorities and be a finisher. To manage your time well, it helps to set up internal deadlines; for example, by blocking off time on your calendar to finish papers you have set deadlines for.

Stats: Gleaned from her own experience, Dr. Lin offers useful tips for women pursuing careers in academia. In addition, we were interested in learning about Professor Lin’s own career path towards solving the most pressing problems related to big health data. Thank you, Dr. Lin, for participating in this story and in the celebration of the inaugural International Day of Women in Statistics and Data Science! Stay tuned for our next article next week on Assistant Professor of Statistics Morgane Austern.