{"id":1011,"date":"2025-12-08T14:06:00","date_gmt":"2025-12-08T14:06:00","guid":{"rendered":"https:\/\/web.eecs.umich.edu\/~girasole\/?p=1011"},"modified":"2026-01-02T20:12:08","modified_gmt":"2026-01-02T20:12:08","slug":"spada-lab-at-neurips-2025","status":"publish","type":"post","link":"https:\/\/web.eecs.umich.edu\/~girasole\/?p=1011","title":{"rendered":"SPADA lab at Neurips 2025"},"content":{"rendered":"\n<p>SPADA lab had two interesting works to share at Neurips this year. The first was <a href=\"https:\/\/openreview.net\/forum?id=XfHfTqeXfZ\">MonarchAttention<\/a>, which received a spotlight; thanks to everyone who stopped by the poster. See our earlier post for an example of how our method offers a zero-shot drop-in replacement for softmax attention at a significant savings of memory and computation \u2013 with very little accuracy loss. This technique has a University of Michigan patent pending.<\/p>\n\n\n\n<p>The second work is on the topic of <a href=\"https:\/\/transformerstheory.github.io\/pdf\/49_kwon_et_al.pdf\">Out-of-Distribution In-Context Learning<\/a>, which we presented at the <a href=\"https:\/\/transformerstheory.github.io\/#accepted\">What Can\u2019t Transformers Do? Workshop<\/a>. We analyze the solution for training linear attention on an out-of-distribution linear regression test task, where the training task is a regression vector either drawn from a single subspace or a union of subspaces. In the case of a union of subspaces, we can generalize to the span of the subspaces at test time.<\/p>\n\n\n\n<p>Nice work to all the students: Can, Soo Min (both SPADA lab members), as well as our treasured collaborators Alec, Pierre, and Changwoo!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>SPADA lab had two interesting works to share at Neurips this year. The first was MonarchAttention, which received a spotlight; thanks to everyone who stopped by the poster. See our earlier post for an example of how our method offers a zero-shot drop-in replacement for softmax attention at a significant savings of memory and computation [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,13],"tags":[],"_links":{"self":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts\/1011"}],"collection":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1011"}],"version-history":[{"count":1,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts\/1011\/revisions"}],"predecessor-version":[{"id":1012,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts\/1011\/revisions\/1012"}],"wp:attachment":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1011"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1011"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1011"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}