{"id":986,"date":"2025-05-29T19:34:01","date_gmt":"2025-05-29T19:34:01","guid":{"rendered":"https:\/\/web.eecs.umich.edu\/~girasole\/?p=986"},"modified":"2025-05-29T19:34:01","modified_gmt":"2025-05-29T19:34:01","slug":"analyzing-out-of-distribution-in-context-learning","status":"publish","type":"post","link":"https:\/\/web.eecs.umich.edu\/~girasole\/?p=986","title":{"rendered":"Analyzing Out-of-Distribution In-Context Learning"},"content":{"rendered":"\n<p>We posted a new paper on arxiv presenting analysis on the capabilities of attention for in-context learning. There are many perspectives out there on whether it&#8217;s possible to do in-context learning out-of-distribution: some papers show it&#8217;s possible, and others do not, mostly with empirical evidence. We provide some theoretical results in a specific setting, using linear attention to solve linear regression. We show a negative result that when the model is trained on a single subspace, the risk on out-of-distribution subspaces is lower bounded and cannot be driven to zero. Then we show that when the model is instead trained on a union-of-subspaces, the risk can be driven to zero on <em>any test point in the span<\/em> of the trained subspaces &#8211; even ones that have zero probability in the training set. We are hopeful that this perspective can help researchers improve the training process to promote out-of-distribution generalization. <\/p>\n\n\n\n<p>Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu. &#8220;Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective.&#8221; <a href=\"https:\/\/arxiv.org\/abs\/2505.14808\">https:\/\/arxiv.org\/abs\/2505.14808<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We posted a new paper on arxiv presenting analysis on the capabilities of attention for in-context learning. There are many perspectives out there on whether it&#8217;s possible to do in-context learning out-of-distribution: some papers show it&#8217;s possible, and others do not, mostly with empirical evidence. We provide some theoretical results in a specific setting, using [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,13],"tags":[],"_links":{"self":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts\/986"}],"collection":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=986"}],"version-history":[{"count":1,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts\/986\/revisions"}],"predecessor-version":[{"id":987,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=\/wp\/v2\/posts\/986\/revisions\/987"}],"wp:attachment":[{"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/web.eecs.umich.edu\/~girasole\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}