Nandan Sarkar

MIT Department: Electrical Engineering and Computer Science
Faculty Mentor: Prof. Yoon Kim
Research Supervisor: Abbas Zeitoun
Undergraduate Institution: Yale University
Website:
Biography
Nandan Sarkar is a senior from Nashville, Tennessee, studying Computer Science andApplied Math at Yale. He is very interested in machine learning and is committed to pursuing a Ph.D. in this field. At Yale, he is an undergraduate researcher in the YaleNLP lab underProfessor Arman Cohan. For the past year—and this summer at MIT—he has been a member of the Computation and Language Lab under Professor Yoon Kim. Previously, Nandan conducted research at Vanderbilt on human-computer interaction and augmented reality. Since freshman year, Nandan has been deeply involved in Code Haven, a student-run organization dedicated to expanding computer science education for middle school students in New Haven. Outside of school, Nandan enjoys playing squash, watching detective shows, and playing poker with friends. He is also a big soccer and football fan (a die-hard fan of Barcelona and the Tennessee Titans) and loves country music and hot chicken.
Abstract
Learning to Reason from Inferred Steps: An EM Framework for Enhancing LL Reasoning
Nandan Sarkar1, Abbas Zeitoun2, and Yoon Kim2
1Department of Computer Science, Yale University
2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Large language models (LLMs) have shown impressive capabilities in solving complex tasks that require multi-step reasoning, such as those in mathematics, science, and logic. However, training models to produce high-quality reasoning traces often involves supervised fine-tuning on teacher-generated step-by-step explanations or reinforcement learning using verifiable rewards. In this work, we aim to enhance reasoning capabilities, particularly on tasks that lack verifiable rewards, by enabling the model to better interpret and internalize reasoning traces. We introduce an Expectation-Maximization (EM) framework that treats intermediate reasoning steps as latent variables. We begin by fine-tuning OpenThinker-7B, a model exposed to diverse reasoning data, to predict missing steps in reasoning traces. Then, during the E-step, we sample latent intermediate steps for partially masked traces from the model’s predictions. In the M-step, the model is fine-tuned on these completed traces, using its own inferred reasoning to iteratively improve. This self-training approach enables the model to enhance its reasoning abilities without heavily relying on external supervision. We evaluate our method on a range of challenging reasoning benchmarks, including AMC and AIME math competitions, GPQA science questions, and other logic-intensive tasks. Preliminary results suggest that our EM-style framework improves both reasoning coherence and final answer accuracy, particularly under constrained data budgets.