Skip to Content

Amiri Hayes

Amiri Hayes

MIT Department: Electrical Engineering and Computer Science
Faculty Mentor: Prof. Jacob Andreas
Research Supervisor: Belinda Li
Undergraduate Institution: New Jersey Institute of Technology
Website:

Biography

Amiri Hayes is an incoming senior in the Honors College at New Jersey Institute of Technology pursuing a Bachelor’s Degree in Applied Mathematics alongside a dual Master’s in Artificial Intelligence. Before matriculating to NJIT, he was a homeschooled student who earned three associate degrees in Physics, Mathematics, and Computer Science at Rowan College of South Jersey – Gloucester alongside his high school diploma. Since then, he has held a software engineering co-op position at UPS as well as a summer position as aMathematics Student Researcher at the Institute for Pure and Applied Mathematics at UCLA. His other interests involve investing and computational social science, which have led to his involvement in co-founding an Investment Club and conducting independent research in automated transportation infrastructure evaluations. Amiri aspires to attend graduate school to improve his ability to use mathematics and statistics to model complex systems.

Abstract

Filtering Attention Heads through Automable Interpretability Experiments

Amiri Hayes1, Jacob Andreas2, and Belinda Li2

1Department of Mathematical Sciences, New Jersey Institute of Technology

2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology

Transformer-based large language models (LLMs) like BERT and GPT have transformed natural language processing, yet their internal mechanisms remain opaque. To improve interpretability, we focus on understanding the function of attention heads, which are learned components that direct focus across input sequences. Prior work shows that some heads consistently track syntactic or semantic relationships, suggesting interpretable structure. We propose an automated, generalizable method for describing attention heads in human-interpretable terms using program synthesis: we associate each head with a symbolic program that specifies how the head might operate. Such programs exist for a variety of phenomena and serve as hypotheses which can then be tested against actual attention activations by computing distance metrics. Additionally, we explore whether LLMs can aid in the process of constructing these programs by predicting and programmatically testing their own hypothesis about head functions. By analyzing attention behavior across layers, models, and datasets, we assess which functions are stable and generalizable. Our findings suggest that many attention heads exhibit consistent, interpretable behavior, and that program-driven analysis can effectively reveal roles of specific attention mechanisms. This project contributes a framework for reverse-engineering attention functions, helping to bridge the gap between black-box model architecture and human linguistic understanding.

« Back to profiles