Code for Deep LoRA

Our ICML paper “Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation” shows that it is possible to get the benefits of deep overparameterization without increasing the number of trainable parameters. The code for our experiments can be found at Can’s github site: We welcome any thoughts and questions if you use and adapt our code for your problem!

SPADA lab at ICML in Vienna

I am excited to be a part of three papers at the International Conference of Machine Learning this July in Vienna.

Congratulations to Can Yaras for having his work on compression in deep low-rank learning, with co-authors Peng Wang and Qing Qu, accepted as an oral presentation for Tuesday afternoon! This work proves that when training deep linear networks, the gradient descent dynamics are limited to an invariant subspace. This subspace can be leveraged to make training and overparameterization more efficient, and allows us to reap the benefits of deep overparameterization without the computational burden. The code is available on Can’s github site. I talked about this work for the 1W-Minds seminar in April.

Peng Wang and Huikang Liu led our work on symmetric matrix completion with ReLU sampling that will be presented as a poster on Wednesday. We showed that it is possible to recover a low-rank matrix with sampling that is highly dependent on the matrix entries — we focus on ReLU sampling (and variants) where only positive entries are observed.

Finally, Wisconsin-Madison PhD student Yuchen Li will be presenting his work on block Riemannian MM methods, also with a poster on Wednesday. He proved iteration guarantees for convergence to a stationary point for general multi-block MM algorithms where any number of blocks may be constrained to a Riemannian manifold. His complexity results reduce to well-known results in the Euclidean case. This work is broadly applicable to alternating MM algorithms for machine learning problems.

SPADA Lab Research at AI Stats

Congratulations to Davoud Ataee Tarzanagh and Soo Min Kwon, whose research was presented in poster sessions at AI Stats this morning!

Davoud’s work on Online Bilevel Optimization was entirely conceived and driven by him during his postdoc at UM. The paper has novel definitions of bilevel dynamic regret, and he and Parvin proved many fabulous results for regret bounds for online alternating gradient descent in the strictly convex setting (with a matching lower bound) all the way to the nonconvex setting. He demonstrated its usefulness on online hyperparameter tuning, online loss tuning for imbalanced data, and then online meta learning with Bojian’s expertise! Online learning has provided a sea change for so much of ML on massive data, and we believe that OBO is a next crucial step for modern applications that commonly require careful balancing of objectives.

Soo Min and Tsinghua student Zekai Zhang’s work on Efficient Low-Dimensional Compression of Overparameterized Models demonstrates a method for compressing overparameterized deep linear layers in deep networks. Their approach gets consistently improved generalization error in a fraction of the computation time. The work shows that leveraging inherent low-dimensional structure within the model parameter updates, we can reap the benefits of overparameterization without the computational burden.

Research presentations at CPAL 2024

At the inaugural Conference on Parsimony and Learning (CPAL), my group is presenting three works that have come out of a recent exciting collaboration with UM Prof Qing Qu and other colleagues on low-rank learning in deep networks. Prof Qu’s prior work studying neural collapse in deep networks has opened many exciting directions for us to pursue! All three works study deep linear networks (DLNs), i.e. deep matrix factorization. In this setting (which is simplified from deep neural networks that have nonlinear activations), we can prove several interesting fundamental facts about the way DLNs learn from data when trained with gradient descent. Congratulations SPADA members Soo Min Kwon, Can Yaras, and Peng Wang (all co-advised by Prof Qu) for these publications!

Yaras, C., Wang, P., Hu, W., Zhu, Z., Balzano, L., & Qu, Q. (2023, December 1). Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Linear Networks. Conference on Parsimony and Learning (Recent Spotlight Track).

Wang, P., Li, X., Yaras, C., Zhu, Z., Balzano, L., Hu, W., & Qu, Q. (2023, December 1). Understanding Hierarchical Representations in Deep Networks via Feature Compression and Discrimination. Conference on Parsimony and Learning (Recent Spotlight Track).

Kwon, S. M., Zhang, Z., Song, D., Balzano, L., & Qu, Q. (2023, December 1). Efficient Low-Dimensional Compression of Overparameterized Networks. Conference on Parsimony and Learning (Recent Spotlight Track).

Congratulations Dr. Gilman and Dr. Du!

Last fall and winter, SPADA PhD students Kyle Gilman and Zhe Du graduated. Kyle’s thesis was titled “Scalable Algorithms Using Optimization on Orthogonal Matrix Manifolds,” and he continues to make fundamental contributions to interesting modern optimization problems. He is currently an Applied AI/ML Senior Associate at JPMorgan Chase. Zhe’s thesis was titled “Learning, Control, and Reduction for Markov Jump Systems,” with lots of interesting work at the intersection of machine learning and control. He is currently a Postdoctoral researcher working with Samet Oymak and Fabio Pasqualetti. I am excited to follow their work into the future as they make an impact in optimization, machine learning, and control!

MLK Spirit Award

I am honored to have received an MLK Spirit Award from the Michigan College of Engineering. These awards are given to university members who exemplify the leadership and vision of Reverend Dr. Martin Luther King, Jr. through their commitment to social justice, diversity, equity, and inclusion. That commitment is a very high priority for me, so I am grateful that others have felt the impact of my actions.

K-Subspaces Algorithm Results at ICML

I’m excited that our results for the K-Subspaces algorithm were accepted to ICML. My postdoc Peng Wang will be presenting his excellent work; you may read the paper here or attend his session if you are interested. K-Subspaces (KSS) is a natural generalization of K-Means to higher dimensional centers, originally proposed by Bradley and Mangasarian in 2000. Peng not only showed that KSS converges locally, but that a simple spectral initialization guarantees a close-enough initialization in the case of data drawn randomly from arbitrary subspaces. This makes a giant step in a line of questioning that has been open for more than 20 years. Great work Peng!

Code for Subspace Tracking with Missing Data

Code for my Proceedings of IEEE paper with Yuejie Chi and Yue Lu, Streaming PCA and Subspace Tracking: the Missing Data Case can be found here.

Optimally Weighted Heteroscedastic PCA Code

We have updated our paper on optimally weighted heteroscedastic PCA and here is the code to run the experiments. In this work we show how to weight data before solving PCA, under a spiked covariance model with heteroscedastic additive noise. Surprisingly, the weights are not inverse noise variance, and neither are they 0/1 discarding the noisiest points, but instead the optimal weights are in between these two standard heuristics.

I am for real dating and communication. For several reasons: 1. It is immediately clear whether you like the person and whether it is mutual. 2. It’s a pity for your time. But I also met online disabled relationships. I know couples who were able to create families with the help of him (both with and without disabilities).

DoE funding for sketching algorithms and theory

Hessam Mahdavifar and I have been awarded funds from the Department of Energy to study sketching in the context of non-real-valued data. Randomized sketching and subsampling algorithms are revolutionizing the data processing pipeline by allowing significant compression of redundant information. However, current research assumes input data are real-valued, when many sensing, storage, and computation modalities in scientific and technological applications are best modeled mathematically as other types of data, including discrete-valued data and ordinal or categorical data, among others. You can read about the project here and read a Q&A here that was highlighted on the DoE office of science website. We are excited about the opportunity to expand in this new direction!