Rhianna Smith

MIT Department: Electrical Engineering and Computer Science
Faculty Mentor: Prof. Marzyeh Ghassemi
Research Supervisor: Qixuan (Alice) Jin
Undergraduate Institution: Dartmouth College
Website:
Biography
Rhianna Smith is a rising senior at Dartmouth College, majoring in Computer Science.Originally from Jamaica, she brings a personal understanding of how inequality can be embedded in everyday systems and is passionate about developing ethical AI that prioritizes equity.At the MIT Summer Research Program, she works with the Healthy ML group, led by Dr.Marzyeh Ghassemi, to mitigate memorization in diffusion models, focusing on potential bias.At Dartmouth, Rhianna conducts research under Dr. Peter Mucha and Dr. Nikhil Singh, focusing on bias mitigation in AI agents. She previously interned with the Women in Science Project, creating abstract mathematical models that illustrate how interpersonal and within-person factors influence drinking behavior. In Summer 2024, she interned at Morgan Stanley, analyzing market trends to develop bond portfolio strategies that optimize returns under varying risk profiles. A Stamps and Goldwater Scholar, Rhianna balances research with leadership and service, including her role in Delta Sigma Theta Sorority, where she leads social action initiatives. She also serves as Co-President of Women in Computer Science and secretary of the Association for Women inMathematics. Rhianna’s interdisciplinary perspective and commitment to equity enable her to approach research and collaboration with both technical rigor and empathy
Abstract
Memorization and its Bias in Diffusion Models
Rhianna Smith1, Qixuan Jin2, and Marzyeh Ghassemi2
1Department of Computer Science, Dartmouth College
2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
From generating breathtaking artwork to synthesizing medical data, diffusion models have extensive applications and are further weaving their way into everyday use. However, the nature of diffusion models can result in them replicating samples with high fidelity from their training data—a process known as memorization. While previous studies examined factors that may impact memorization, little work exists concerning the types of data diffusion models tend to memorize. Additionally, previous mitigation techniques have limited applications to publicly released diffusion models. Thus, more work needs to be done to prevent memorization before it happens. Therefore, I aim to identify specific sample subgroups that are more likely to be memorized than others. Furthermore, I will determine if memorization can be recognized through distinguishable patterns in the inference trajectories of generated samples and, if so, create a method to redirect inference from the identified memorized spaces. In doing this, not only will I be able to identify potential biases within diffusion models that could put the sensitive information of certain subgroups at risk, but I will also develop a mitigation technique that can stop memorization at its early stages, thereby protecting these groups from privacy compromises.