André Braga

MIT Department: Electrical Engineering and Computer Science
Faculty Mentor: Prof. Jacob Andreas
Research Supervisor: Mehul Damani
Undergraduate Institution: University of California, Santa Barbara
Website:
Biography
André Braga is a rising third-year Statistics & Data Science major at the University ofCalifornia, Santa Barbara, and a 2025 MIT MSRP intern in the Computer Science Department. Working with Professor Jacob Andreas, he uses reinforcement learning to make large language models more trustworthy, with an emphasis on uncertainty estimation. At UCSB he researches with Professors Xifeng Yan and Mingsong Yan, investigating attention-scaling techniques that enhance long-context reasoning and retrieval-augmented generation in transformer models, aiming to make large language systems faster and more interpretable. He also contributed to an AI-driven financial forecasting project with Professor Yan that explored feedback loops between search and prediction agents. Beyond academia, André co-founded Shofo, a startup where he designs fine-tuning workflows and user-facing machine-learning infrastructure for large-scale social-media analytics. He plans to pursue a Ph.D. in machine learning, developing novel architectures for trustworthy, interpretable continuous learning.
Abstract
Training ChatGPT to Double-Check Itself with a Separate ‘Judge’ Model
Andre Braga1, Mehul Damani2, and Jacob Andreas2
1Department of Statistics, University of California, Santa Barbara
2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Large language models (LLMs) often sound confident even when they are wrong, limiting their reliability in domains such as medicine or law, where it is vital to know when the model may not be certain of its answer. Existing methods for teaching these models to be more cautious typically judge only whether the final answer is correct or incorrect, but do little to explore the validity of how the model arrived at its answer. Our work proposes the use of a feedback-based training method called reinforcement learning that leverages an outcome reward model (ORM) to act as a separate ‘judge’ that scores both the answer and the reasoning process behind it. We then use this reward to train the LLM to adjust its behavior, learning not only to be correct but also to show caution if necessary, and even say “I don’t know” when it's unsure about its answer. By training language models with feedback from a reasoning-aware reward model, our approach addresses a gap in existing confidence-improvement methods that overlook the derivation of answers. This method produces models that are not only more accurate but also more cautious, able to express uncertainty appropriately, and ultimately better for real-world decision-making.