Alexander Zalles

MIT Department: Electrical Engineering and Computer Sciences
Faculty Mentor: Prof. Justin Solomon
Research Supervisors: Mikhail Yurochikin, Jiacheng Zhu
Undergraduate Institution: Rice University
Hometown: Austin, Texas
Website: LinkedIn

Biography

Alex Zalles is a rising senior at Rice University studying Operations Research and English. A book lover, part-time barista, and avid researcher, Alex enjoys doing a bit of everything, both at the Rice campus and abroad. He participated in the Data Science REU at Rice, where he developed novel methods for network regression, increasing the modalities with which we can analyze underlying relationships occurring in distributed systems. After polishing and presenting his work at NeurIPS and GCURS that following year, the next summer he participated in the MIT Summer Research Program. Focusing on deep learning, Alex worked to develop improved algorithms for neural network alignment and merging, a problem increasingly important in the era of big data. Following these experiences, he hopes to continue his studies by pursuing a PhD, and plans on continuing on into academia, to help pave the way for the next generation of students and thinkers

Abstract

Neural Network Alignment Through Knowledge Distillation and Low
Rank Adaptation

Alexander Zalles¹, Jiacheng Zhu², Mikhail Yurochkin³, Justin Solomon²
¹Department of Computational Applied Mathematics and Operations Research, Rice University
²Department of Electrical Engineering and Computer Science, Massachusetts Institute
of Technology
³ MIT-IBM AI Watson Lab

While neural networks excel at machine learning tasks, full understanding of their function is still developing. A large question concerns the existence of equally optimal network parameters at distant points in the highly non-convex loss landscape. Previous works relate these distant points using permutation alignment, arguing that a large number of the minima are equivalent modulo permutations. Other works connect loss basins through quadratics or poly-chains. While both approaches provide insight on the optimization space, they also have drawbacks. For large networks trained with different hyperparameters, permutations alone do not achieve low loss when interpolating between parameters, and nonlinear curves require optimizing a third network with many parameters. To mitigate these issues, we propose a method that views model alignment through knowledge distillation (KD), where a third network’s parameters are optimized to balance function and linear interpolation. Reduction of the computational cost arises through low-rank adaptation (LoRA), optimizing low-rank updates of initial parameters instead of relearning full-rank matrices. This establishes a quick procedure to align two networks to be linear mode connectable while maintaining competitive performance, providing insights towards the structure of high-dimensional loss landscapes while supporting model fine-tuning methods.

« Back to profiles