Reflections on Harvard’s Data and Society Summer Book Club

August 9, 2022

A stats faculty member, biostats PhD student, research librarian, data science master’s student, and 2 undergrads all wound up in the same virtual space this summer. Was this just serendipity? In fact, this weekly occurrence was a result of Dr. Kelly McConville’s vision to “create a space for folks from across the campus to get out of their silos and share their thoughts on data and society with each other.” With support from the Harvard Data Science Initiative, Kelly was able to order the books and launch this unique cross-departmental, cross-school, and cross-continental book club.

Each week, participants read and discussed in Zoom breakout rooms a section of the assigned text. The first book, “The Alignment Problem: Machine Learning and Human Values” by Brian Christian, took the book club on a historical journey of the development of machine learning and some of its most pressing problems related to imperfect data and bias. The second book, “Invisible Women: Data Bias in a World Designed for Men” by Caroline Criado Perez, expanded our view of the ways that women are not factored in, whether in public safety, transportation, retirement planning, or in clinical trials. To understand what participants gained from talking about these texts with a diverse group, I reached out to interview some of them and find out.

1. Tell us who you are (your “boiler plate” introduction) and what brought you to the Data Science and Society Book Club.

alexander youngAlex Young: I’m an undergraduate advisor and lecturer in the Statistics Department at Harvard. Previously, I completed my PhD in applied mathematics before transitioning to statistics during my postdoc. I work a lot at the intersection of applied probability and applied math. The more I’ve learned about statistics and how to think critically about imperfect data, the more cautious I am with applying machine learning algorithms. At the intersection of computer science and statistics, these issues with machine learning are some of the most critical things that we should be studying right now. Curiosity about these problems and how folks are attacking them brought me to the book club. 

Ellen Considine: I’m a rising 3rd year student in the biostats PhD program and in the National Studies on Air Pollution and Health research group, which focuses on environmental health applications of statistics and data science. Biostatistics is a field about attempting to solve problems for the social good Ellen Considine(e.g. in public health and medicine) by using statistics and data science. When one of my advisors, the co-Director of HDSI Francesca Dominici, told me about the book club, I thought this was a fun opportunity to step out of the more technical side to focus on the more human side, which is the motivation behind the work in my lab group.

James CapobiancoJames Capobianco: I’m a research and data librarian and have worked at Harvard for about 14 years or so. A couple of years ago, I started a degree in data science at the Harvard Extension School, which has fostered my interest in topics related to data bias and problems with AI in society (and in this book club!).

Konstantina YanevaKonstantina Yaneva: I live in Europe and am pursuing a Master’s in Data Science at the Harvard Extension School. Broadly speaking, I work in computational journalism, and I collaborate with a lot of activists and journalists advocating for freedom of expression. The book club was a way to connect with the larger university community and to learn from the knowledge and expertise of others.

Al XinAl Xin: I’m a rising senior at Harvard College studying statistics with a secondary concentration in linguistics. I’m planning to apply to med school next cycle. I have been involved with the effective altruism student group on campus, which is part of a broader movement for finding the areas in which people can have the biggest positive impact on society – one of these areas is artificial intelligence safety. When I read in the HDSI newsletter about a summer book club that addressed these topics, I decided to join.

Maxwell VanLandschootMaxwell VanLandschoot: I’m a visiting undergraduate research fellow at Harvard and a senior at Reed College, where I was a student of Dr. Kelly McConville. Kelly reached out to me about the summer book club and, being familiar with Kelly and her previous book clubs, I knew I wanted to be a part of it.

2. Describe aspects that you’ve enjoyed or found rewarding about the book club so far.

Ellen Considine: I started undergrad in 2016, which in the book “The Alignment Problem…” was around the time when issues of bias and interpretability [the ability to understand how you get to a model’s result] in machine learning started coming to light. Because I initially heard about these topics in undergrad, it’s fun to have both a history of how these methods were developed and how people started asking about their implications.

Konstantina Yaneva: Some other classroom book discussions can be more prescriptive, but this book club is voluntary, which means that people are more engaged. While the books take on complex topics (they don’t just pay lip service to ethics in AI), they are still accessible to a broad audience, and you can learn something valuable regardless of your background. There is this famous phrase that “I’m for everything good and against everything bad;” sometimes discussions can be so general that everyone agrees, which is boring! In contrast, there is disagreement in these book club discussions, and people give concrete examples from their research and work to support their point.

Al Xin: In previous book clubs, I’ve mostly been with other college students, which is good for peer-to-peer interaction, but it's been useful in this book club to have a wide range of academic, life, and career experience to provide more depth to the discussion. One of the big benefits of having people with expertise in different areas is that they can identify aspects of the book that are slightly inaccurate or impractical in the real world based on their experience. For example, some people in my group with experience in industry brought up the critique that the book didn’t address the impact of business competition in pressuring employees to pump out algorithms without properly testing them.

Maxwell VanLandschoot: My favorite thing has been engaging with the variety of people. The breakout room that we had today was a perfect example: there were two PhD students from different programs, someone in the medical field, a preceptor in the Statistics Department, and me, an undergraduate in econ. This relative diversity of opinion has been very beneficial to the group conversations. By discussing the same work/chapter by the same author at the same time, we experience an equal playing field. There aren’t many contexts in which you can have these casual, yet thought-provoking, discussions with professors, PhD students and other people who are a lot more experienced in these fields.

3. What is the book are you currently reading and is there an interesting discussion or topic that you recall while reading the book?

James Capobianco: Some of the initial questions in “The Alignment Problem…” have been interesting, like the session in which my group talked about intrinsic vs. extrinsic motivation in relation to reinforcement learning. When using reinforcement learning, you want to be careful about rewarding the right kind of behavior; otherwise, you might reward for a behavior that is not productive [Brian Christian gives one example of a group of scientists who rewarded a self-driving bicycle incorrectly, which resulted in the bicycle driving in circles, instead of driving towards a goal]. This section was interesting because it connected back to human psychology and even philosophy, which made our discussion delve more into topics beyond just computer science, machine learning, and AI.

Konstantina Yaneva: I found the chapters on reinforcement learning (in “The Alignment Problem”) and their discussion of the agent to be very interesting. When the agent engages with the world, it learns from the world and affects the environment. Before releasing this agent, it’s necessary to know what question you want to answer and what problem you’re trying to solve. When we arrive at a solution, we need to review the original goal, check the work, and perform due diligence, which is often lacking today in society. The book gives so many examples of models being deployed and going wrong, such as technology that would overestimate the likelihood of a black defendant reoffending and underestimate the likelihood of a white defendant reoffending. When these models are deployed, they make real world decisions, impacting people's lives. Sometimes in the field people get really excited about technology and it’s easy to over promise, but there is a difference between the technology of our dreams versus where the technology currently stands.

Alex Young: I was very interested in reading the section on women in the workplace (in “Invisible Women”), which focused on data (or a lack of data) related to women’s paid and unpaid labor. At a societal level, it is very important to consider: how do we show that we value unpaid work? My wife and I are new parents, so we’ve been thinking very critically about balancing the housework on a day-to-day basis. While women undeniably experience a disproportionate workload in the home, the author does not account for the functionality of a home in its entirety. For example, this summer I’ve been spending hours on home repair and remodeling, landscape work, painting, plastering, etc. I doubt that it would tip the scales back to some sort of gender parity, but I’d be interested to see how the author would address this kind of unpaid work and if this would modify the discussion in a meaningful way.

Maxwell VanLandschoot: The topic that just kept coming up in “Invisible Women” that's stayed with me is the pervasiveness of the lack of visibility of women in academia, industry, and life in general. There are issues with women's access to just about anything, even public transportation. The author gives an example of how the layout of a transportation system can have gendered implications because men and women have different travel needs. A large city typically has a route that travels from the outer ring to the center, which favors male commuter patterns; in contrast, women tend to have more complicated travel patterns within the city, so a grid system would work better for them. This example was eye-opening because the spatial layout of public transit was not a lens that I would have applied to women’s issues.

4.  Is there another kind of book club that you would pursue in the future?

Alex Young: In the future, I would be very interested in a book club that looks at the intersection of philosophy, ethics, data, and the law. The reason why looking at data and the law is intriguing is because mathematical theory can only carry you so far. We have many definitions for fairness that are not compatible. A purely mathematical approach won’t suffice because you need to factor in legal and ethical arguments to express what a society chooses to prioritize.

Konstantina Yaneva: For another book club, I would be interested in learning more about the history of thought or what kind of research has already been done related to data science. For example, it would be interesting to know more examples of how women are negatively impacted by technology design and what concrete solutions there are to these problems. Knowing the history of these ideas is important because it prevents you from reinventing the wheel each time.

Maxwell VanLandschoot: In future book clubs, I’d be perfectly happy with a focus on data and policy, including more about the implications of policy and calls to action. This book club has given me the sense that I should step into the world with a broader perspective and bring in voices and opinions that are less acknowledged, but there's not a specific action item. One aspect that I would want to stay the same in another book club is the degree of accessibility of these texts – instead of presenting statistics-heavy theory, they explain some simple results in straightforward language. By stepping back from the theory and looking at the real-world implications of the data, good texts can communicate ideas that people can use to create change.

5.  What's your favorite book (or a book you’ve enjoyed recently)?

Alex Young: “East of Eden” by John Steinbeck and “The Name of the Wind” by Patrick Rothfuss

Ellen Considine: “Dawn” by Octavia Butler

James Capobianco: “10 Essays on Fizz Buzz” by Joel Grus

Konstantina Yaneva: “Rabbits and Boa Constrictors” by Fazil Iskander

Al Xin: “Emperor of All Maladies” by Siddhartha Mukherjee and “Paula” by Isabel Allende

Based on this varied list, it looks like our reading is cut out for us for next summer! We appreciate these participants sharing their experience with us and look forward to the next book club with Dr. McConville.