News

Laura Balzano June 6, 2025

Monarch Attention

The attention module in transformer architectures is often the most computation and memory intensive unit. Many researchers have tried different ways to approximate softmax attention in a compute efficient way. We have a new approach that uses the Monarch matrix structure along with variational softmax to quickly and accurately approximate softmax attention in a zero-shot setting. The results are very exciting — we can significantly decrease the compute and memory requirements while taking at most a small hit to performance. This figure shows the performance versus computation of our “Monarch-Attention” method as compared to Flash Attention 2 (listed as “softmax”) and other fast approximations.

See the paper for additional results, including hardware benchmarking against Flash Attention 2 on several sequence lengths.

Can Yaras, Alec S. Xu, Pierre Abillama, Changwoo Lee, Laura Balzano. “MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention.”
https://arxiv.org/abs/2505.18698
Code can be found here.

Filed Under: Code, News, SPADA

Laura Balzano May 29, 2025

Analyzing Out-of-Distribution In-Context Learning

We posted a new paper on arxiv presenting analysis on the capabilities of attention for in-context learning. There are many perspectives out there on whether it’s possible to do in-context learning out-of-distribution: some papers show it’s possible, and others do not, mostly with empirical evidence. We provide some theoretical results in a specific setting, using linear attention to solve linear regression. We show a negative result that when the model is trained on a single subspace, the risk on out-of-distribution subspaces is lower bounded and cannot be driven to zero. Then we show that when the model is instead trained on a union-of-subspaces, the risk can be driven to zero on any test point in the span of the trained subspaces – even ones that have zero probability in the training set. We are hopeful that this perspective can help researchers improve the training process to promote out-of-distribution generalization.

Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu. “Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective.” https://arxiv.org/abs/2505.14808.

Filed Under: News, SPADA

Laura Balzano April 19, 2025

SDP Relaxation paper in SIMAX

I’m excited that our paper “A Semidefinite Relaxation for Sums of Heterogeneous Quadratic Forms on the Stiefel Manifold” has been published in the SIAM Journal on Matrix Analysis and Applications. https://doi.org/10.1137/23M1545136. We were inspired to work on this problem after it popped up inside the heteroscedastic PCA problem. It’s a fascinating, simple, general problem with connections to PCA, joint diagonalization, and low-rank semidefinite programs. Applying the standard Schur relaxation to this problem gives a trivial (and incorrect) solution, but one minor change makes the relaxation powerful and even tight in many instances. You can find the code for experiments here.

Filed Under: Code, News

Laura Balzano February 11, 2025

Sarah Goddard Power Award

I am honored to have received the Sarah Goddard Power Award, an award given to those who contribute to the advancement of women in scholarship and academic leadership. Right now this work is critically important in my field, as technology in machine learning, artificial intelligence, and computing changes our world on a daily basis. Technology is often thought of as an objective pursuit, where the goals are clear and well-defined, and only those who are “math geniuses” can make a contribution. This couldn’t be further from the truth – we are constantly defining the goals and values of our technology, and diverse voices are key to creating technology that lifts us up as a whole society.

Filed Under: News

Laura Balzano November 5, 2024

IEEE Signal Processing Magazine – Special Issue on the Mathematics of Deep Learning

I am the lead guest editor on a Signal Processing Magazine special issue on the Mathematics of Deep Learning: https://signalprocessingsociety.org/publications-resources/special-issue-deadlines/ieee-spm-special-issue-mathematics-deep-learning. My excellent co-editors are Joan Bruna, Gitta Kutyniok, Robert Nowak, and Jong Chul Ye. We have extended the White Paper deadline to this Friday, November 8. Please share with anyone who is interested but missed the deadline last Friday. We look forward to your submissions!

Filed Under: News

Laura Balzano June 25, 2024

SPADA lab at ICML in Vienna

I am excited to be a part of three papers at the International Conference of Machine Learning this July in Vienna.

Congratulations to Can Yaras for having his work on compression in deep low-rank learning, with co-authors Peng Wang and Qing Qu, accepted as an oral presentation for Tuesday afternoon! This work proves that when training deep linear networks, the gradient descent dynamics are limited to an invariant subspace. This subspace can be leveraged to make training and overparameterization more efficient, and allows us to reap the benefits of deep overparameterization without the computational burden. The code is available on Can’s github site. I talked about this work for the 1W-Minds seminar in April.

Peng Wang and Huikang Liu led our work on symmetric matrix completion with ReLU sampling that will be presented as a poster on Wednesday. We showed that it is possible to recover a low-rank matrix with sampling that is highly dependent on the matrix entries — we focus on ReLU sampling (and variants) where only positive entries are observed.

Finally, Wisconsin-Madison PhD student Yuchen Li will be presenting his work on block Riemannian MM methods, also with a poster on Wednesday. He proved iteration guarantees for convergence to a stationary point for general multi-block MM algorithms where any number of blocks may be constrained to a Riemannian manifold. His complexity results reduce to well-known results in the Euclidean case. This work is broadly applicable to alternating MM algorithms for machine learning problems.

Filed Under: News, SPADA

Laura Balzano May 3, 2024

SPADA Lab Research at AI Stats

Congratulations to Davoud Ataee Tarzanagh and Soo Min Kwon, whose research was presented in poster sessions at AI Stats this morning!

Davoud’s work on Online Bilevel Optimization was entirely conceived and driven by him during his postdoc at UM. The paper has novel definitions of bilevel dynamic regret, and he and Parvin proved many fabulous results for regret bounds for online alternating gradient descent in the strictly convex setting (with a matching lower bound) all the way to the nonconvex setting. He demonstrated its usefulness on online hyperparameter tuning, online loss tuning for imbalanced data, and then online meta learning with Bojian’s expertise! Online learning has provided a sea change for so much of ML on massive data, and we believe that OBO is a next crucial step for modern applications that commonly require careful balancing of objectives.

Soo Min and Tsinghua student Zekai Zhang’s work on Efficient Low-Dimensional Compression of Overparameterized Models demonstrates a method for compressing overparameterized deep linear layers in deep networks. Their approach gets consistently improved generalization error in a fraction of the computation time. The work shows that leveraging inherent low-dimensional structure within the model parameter updates, we can reap the benefits of overparameterization without the computational burden.

Filed Under: News, SPADA

Laura Balzano January 4, 2024

Research presentations at CPAL 2024

At the inaugural Conference on Parsimony and Learning (CPAL), my group is presenting three works that have come out of a recent exciting collaboration with UM Prof Qing Qu and other colleagues on low-rank learning in deep networks. Prof Qu’s prior work studying neural collapse in deep networks has opened many exciting directions for us to pursue! All three works study deep linear networks (DLNs), i.e. deep matrix factorization. In this setting (which is simplified from deep neural networks that have nonlinear activations), we can prove several interesting fundamental facts about the way DLNs learn from data when trained with gradient descent. Congratulations SPADA members Soo Min Kwon, Can Yaras, and Peng Wang (all co-advised by Prof Qu) for these publications!

Yaras, C., Wang, P., Hu, W., Zhu, Z., Balzano, L., & Qu, Q. (2023, December 1). Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Linear Networks. Conference on Parsimony and Learning (Recent Spotlight Track). https://openreview.net/forum?id=oSzCKf1I5N

Wang, P., Li, X., Yaras, C., Zhu, Z., Balzano, L., Hu, W., & Qu, Q. (2023, December 1). Understanding Hierarchical Representations in Deep Networks via Feature Compression and Discrimination. Conference on Parsimony and Learning (Recent Spotlight Track). https://openreview.net/forum?id=Ovuu8LpGZu

Kwon, S. M., Zhang, Z., Song, D., Balzano, L., & Qu, Q. (2023, December 1). Efficient Low-Dimensional Compression of Overparameterized Networks. Conference on Parsimony and Learning (Recent Spotlight Track). https://openreview.net/forum?id=1AVb9oEdK7

Filed Under: News, SPADA

Laura Balzano September 20, 2023

Congratulations Dr. Gilman and Dr. Du!

Last fall and winter, SPADA PhD students Kyle Gilman and Zhe Du graduated. Kyle’s thesis was titled “Scalable Algorithms Using Optimization on Orthogonal Matrix Manifolds,” and he continues to make fundamental contributions to interesting modern optimization problems. He is currently an Applied AI/ML Senior Associate at JPMorgan Chase. Zhe’s thesis was titled “Learning, Control, and Reduction for Markov Jump Systems,” with lots of interesting work at the intersection of machine learning and control. He is currently a Postdoctoral researcher working with Samet Oymak and Fabio Pasqualetti. I am excited to follow their work into the future as they make an impact in optimization, machine learning, and control!

Filed Under: News, SPADA

Laura Balzano January 26, 2023

MLK Spirit Award

I am honored to have received an MLK Spirit Award from the Michigan College of Engineering. These awards are given to university members who exemplify the leadership and vision of Reverend Dr. Martin Luther King, Jr. through their commitment to social justice, diversity, equity, and inclusion. That commitment is a very high priority for me, so I am grateful that others have felt the impact of my actions. https://ece.engin.umich.edu/stories/laura-balzano-receives-2023-mlk-spirit-award

Filed Under: News

Monarch Attention

Analyzing Out-of-Distribution In-Context Learning

SDP Relaxation paper in SIMAX

Sarah Goddard Power Award

IEEE Signal Processing Magazine – Special Issue on the Mathematics of Deep Learning

SPADA lab at ICML in Vienna

SPADA Lab Research at AI Stats

Research presentations at CPAL 2024

Congratulations Dr. Gilman and Dr. Du!

MLK Spirit Award

Recent News

Monarch Attention

Analyzing Out-of-Distribution In-Context Learning

SDP Relaxation paper in SIMAX

© Copyright