{"id":3480,"date":"2024-07-11T19:15:58","date_gmt":"2024-07-11T19:15:58","guid":{"rendered":"https:\/\/oge.mit.edu\/msrp\/?post_type=profiles&#038;p=3480"},"modified":"2025-12-11T12:46:48","modified_gmt":"2025-12-11T17:46:48","slug":"alexander-zalles","status":"publish","type":"profiles","link":"https:\/\/oge.mit.edu\/msrp\/profiles\/alexander-zalles\/","title":{"rendered":"Alexander Zalles"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"alignleft size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1443\" height=\"1443\" src=\"https:\/\/oge.mit.edu\/msrp\/wp-content\/uploads\/sites\/2\/2024\/08\/ZallesAlex.jpg\" alt=\"Alexander, Headshot\" class=\"wp-image-3837\" style=\"width:200px\" srcset=\"https:\/\/oge.mit.edu\/msrp\/wp-content\/uploads\/sites\/2\/2024\/08\/ZallesAlex.jpg 1443w, https:\/\/oge.mit.edu\/msrp\/wp-content\/uploads\/sites\/2\/2024\/08\/ZallesAlex-300x300.jpg 300w, https:\/\/oge.mit.edu\/msrp\/wp-content\/uploads\/sites\/2\/2024\/08\/ZallesAlex-1024x1024.jpg 1024w, https:\/\/oge.mit.edu\/msrp\/wp-content\/uploads\/sites\/2\/2024\/08\/ZallesAlex-150x150.jpg 150w, https:\/\/oge.mit.edu\/msrp\/wp-content\/uploads\/sites\/2\/2024\/08\/ZallesAlex-768x768.jpg 768w\" sizes=\"auto, (max-width: 1443px) 100vw, 1443px\" \/><\/figure>\n<\/div>\n\n\n<p><strong>MIT Department: <\/strong>Electrical Engineering and Computer Sciences<br><strong>Faculty Mentor: <\/strong>Prof. Justin Solomon<br><strong>Research Supervisors: <\/strong>Mikhail Yurochikin, Jiacheng Zhu<br><strong>Undergraduate Institution:<\/strong> Rice University<br><strong>Hometown:<\/strong> Austin, Texas<br><strong>Website: <\/strong><a href=\"http:\/\/www.linkedin.com\/in\/alex-zalles-620763250\" data-type=\"URL\">LinkedIn<\/a><\/p>\n\n\n\n<div style=\"height:0px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Biography<\/strong><\/h4>\n\n\n\n<p>Alex Zalles is a rising senior at Rice University studying Operations Research and English. A book lover, part-time barista, and avid researcher, Alex enjoys doing a bit of everything, both at the Rice campus and abroad. He participated in the Data Science REU at Rice, where he developed novel methods for network regression, increasing the modalities with which we can analyze underlying relationships occurring in distributed systems. After polishing and presenting his work at NeurIPS and GCURS that following year, the next summer he participated in the MIT Summer Research Program. Focusing on deep learning, Alex worked to develop improved algorithms for neural network alignment and merging, a problem increasingly important in the era of big data. Following these experiences, he hopes to continue his studies by pursuing a PhD, and plans on continuing on into academia, to help pave the way for the next generation of students and thinkers<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Abstract<\/strong><\/h4>\n\n\n\n<p class=\"has-text-align-center\"><strong>Neural Network Alignment Through Knowledge Distillation and Low<br>Rank Adaptation<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-center\"><strong>Alexander Zalles<sup>1<\/sup>, Jiacheng Zhu<sup>2<\/sup>, Mikhail Yurochkin<sup>3<\/sup>, Justin Solomon<sup>2<\/sup><\/strong><br><sup>1<\/sup>Department of Computational Applied Mathematics and Operations Research, Rice University<br><sup>2<\/sup>Department of Electrical Engineering and Computer Science, Massachusetts Institute<br>of Technology<br><sup>3<\/sup> MIT-IBM AI Watson Lab<\/p>\n\n\n\n<p>While neural networks excel at machine learning tasks, full understanding of their function is still developing. A large question concerns the existence of equally optimal network parameters at distant points in the highly non-convex loss landscape. Previous works relate these distant points using permutation alignment, arguing that a large number of the minima are equivalent modulo permutations. Other works connect loss basins through quadratics or poly-chains. While both approaches provide insight on the optimization space, they also have drawbacks. For large networks trained with different hyperparameters, permutations alone do not achieve low loss when interpolating between parameters, and nonlinear curves require optimizing a third network with many parameters. To mitigate these issues, we propose a method that views model alignment through knowledge distillation (KD), where a third network\u2019s parameters are optimized to balance function and linear interpolation. Reduction of the computational cost arises through low-rank adaptation (LoRA), optimizing low-rank updates of initial parameters instead of relearning full-rank matrices. This establishes a quick procedure to align two networks to be linear mode connectable while maintaining competitive performance, providing insights towards the structure of high-dimensional loss landscapes while supporting model fine-tuning methods.<\/p>\n","protected":false},"featured_media":3837,"template":"","profile_category":[22],"class_list":["post-3480","profiles","type-profiles","status-publish","has-post-thumbnail","hentry","profile_category-2024-interns"],"acf":[],"_links":{"self":[{"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/profiles\/3480","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/profiles"}],"about":[{"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/types\/profiles"}],"version-history":[{"count":5,"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/profiles\/3480\/revisions"}],"predecessor-version":[{"id":4967,"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/profiles\/3480\/revisions\/4967"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/media\/3837"}],"wp:attachment":[{"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/media?parent=3480"}],"wp:term":[{"taxonomy":"profile_category","embeddable":true,"href":"https:\/\/oge.mit.edu\/msrp\/wp-json\/wp\/v2\/profile_category?post=3480"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}