SPADA Lab Research at AI Stats

Congratulations to Davoud Ataee Tarzanagh and Soo Min Kwon, whose research was presented in poster sessions at AI Stats this morning!

Davoud’s work on Online Bilevel Optimization was entirely conceived and driven by him during his postdoc at UM. The paper has novel definitions of bilevel dynamic regret, and he and Parvin proved many fabulous results for regret bounds for online alternating gradient descent in the strictly convex setting (with a matching lower bound) all the way to the nonconvex setting. He demonstrated its usefulness on online hyperparameter tuning, online loss tuning for imbalanced data, and then online meta learning with Bojian’s expertise! Online learning has provided a sea change for so much of ML on massive data, and we believe that OBO is a next crucial step for modern applications that commonly require careful balancing of objectives.

Soo Min and Tsinghua student Zekai Zhang’s work on Efficient Low-Dimensional Compression of Overparameterized Models demonstrates a method for compressing overparameterized deep linear layers in deep networks. Their approach gets consistently improved generalization error in a fraction of the computation time. The work shows that leveraging inherent low-dimensional structure within the model parameter updates, we can reap the benefits of overparameterization without the computational burden.

Research presentations at CPAL 2024

At the inaugural Conference on Parsimony and Learning (CPAL), my group is presenting three works that have come out of a recent exciting collaboration with UM Prof Qing Qu and other colleagues on low-rank learning in deep networks. Prof Qu’s prior work studying neural collapse in deep networks has opened many exciting directions for us to pursue! All three works study deep linear networks (DLNs), i.e. deep matrix factorization. In this setting (which is simplified from deep neural networks that have nonlinear activations), we can prove several interesting fundamental facts about the way DLNs learn from data when trained with gradient descent. Congratulations SPADA members Soo Min Kwon, Can Yaras, and Peng Wang (all co-advised by Prof Qu) for these publications!

Yaras, C., Wang, P., Hu, W., Zhu, Z., Balzano, L., & Qu, Q. (2023, December 1). Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Linear Networks. Conference on Parsimony and Learning (Recent Spotlight Track).

Wang, P., Li, X., Yaras, C., Zhu, Z., Balzano, L., Hu, W., & Qu, Q. (2023, December 1). Understanding Hierarchical Representations in Deep Networks via Feature Compression and Discrimination. Conference on Parsimony and Learning (Recent Spotlight Track).

Kwon, S. M., Zhang, Z., Song, D., Balzano, L., & Qu, Q. (2023, December 1). Efficient Low-Dimensional Compression of Overparameterized Networks. Conference on Parsimony and Learning (Recent Spotlight Track).

Congratulations Dr. Gilman and Dr. Du!

Last fall and winter, SPADA PhD students Kyle Gilman and Zhe Du graduated. Kyle’s thesis was titled “Scalable Algorithms Using Optimization on Orthogonal Matrix Manifolds,” and he continues to make fundamental contributions to interesting modern optimization problems. He is currently an Applied AI/ML Senior Associate at JPMorgan Chase. Zhe’s thesis was titled “Learning, Control, and Reduction for Markov Jump Systems,” with lots of interesting work at the intersection of machine learning and control. He is currently a Postdoctoral researcher working with Samet Oymak and Fabio Pasqualetti. I am excited to follow their work into the future as they make an impact in optimization, machine learning, and control!

K-Subspaces Algorithm Results at ICML

I’m excited that our results for the K-Subspaces algorithm were accepted to ICML. My postdoc Peng Wang will be presenting his excellent work; you may read the paper here or attend his session if you are interested. K-Subspaces (KSS) is a natural generalization of K-Means to higher dimensional centers, originally proposed by Bradley and Mangasarian in 2000. Peng not only showed that KSS converges locally, but that a simple spectral initialization guarantees a close-enough initialization in the case of data drawn randomly from arbitrary subspaces. This makes a giant step in a line of questioning that has been open for more than 20 years. Great work Peng!


Our work on heteroscedastic PCA continues with our article “HePPCAT: Probabilistic PCA for Data with Heteroscedastic Noise,” published in IEEE Transactions on Signal Processing. In this paper we developed novel ascent algorithms to maximize the heteroscedastic PCA likelihood, simultaneously estimating the principal components and the heteroscedastic noise variances. We show a compelling application to air quality data, where it is common to have data both from sensors that are high-quality EPA instruments and others that are consumer grade. Code for the paper experiments is available at, and the HePPCAT method is available as a registered Julia package. Congratulations to my student Kyle Gilman, former student David Hong, and colleague Jeff Fessler.

Congratulations Dr. Bower!

Last fall, my PhD student Amanda Bower defended her thesis titled “Dealing with Intransitivity, Non-Convexity, and Algorithmic Bias in Preference Learning.” Amanda was in the Applied Interdisciplinary Math program, co-advised by Martin Strauss. She will now be moving on to work with Twitter’s ML Ethics, Transparency, and Accountability (META) group. We are so proud that she is going to go make her mark on the world. Congratulations Dr. Bower!

Faktum är att Viagra på nätet börjar fungera efter 20-25 minuter för många patienter, vilket ger upp till 6 timmars prestanda från och med den tiden. Det mesta av Viagra som du kan köpa online är generiskt sildenafilcitrat och ofta är tabletterna märkta på detta sätt.

Preference Learning with Salient Features

I am excited that Amanda Bower will have the opportunity to discuss our new work in preference learning, “Preference Modeling with Context-Dependent Salient Features“, at ICML next week. In this work, we propose a new model for preference learning that takes into account the fact that when making pairwise comparisons, certain features may play an outside role in the comparison, making the pairwise comparison result inconsistent with a general preference order. We look forward to hearing people’s questions and feedback! Update post-conference: Her presentation can be viewed here.

Online Tensor Completion and Tracking

Kyle Gilman and I have a preprint out describing a new algorithm for online tensor completion and tracking. We derive and demonstrate an algorithm that operates on streaming tensor data, such as hyperspectral video collected over time, or chemo-sensing experiments in space and time. Kyle presented his work at the first fully virtual ICASSP, which you can view here. Anyone can register for free to this year’s virtual ICASSP and watch the videos, post questions, and join the discussion. Kyle’s code is also available here. We think this algorithm will have a major impact in speeding up low-rank tensor processing, especially with time-varying data, and we welcome questions and feedback.

The speed of the girl’s movement around the city is about four kilometers per hour, or two and a half, if she wears heels higher than six centimeters. The zone of possible contact is five meters free hookup, no more. That is, everything about everything you get … How much do you get? (Damn, they told me for a reason: study math properly, you will need it!) Well, like, five seconds.

Congratulations Dejiao and David!

SPADA lab is so proud of Dr. Dejiao Zhang and Dr. David Hong. They both successfully defended their PhD dissertations this spring. Dejiao is going to Amazon Web Services next, and David is going to a postdoc at the University of Pennsylvania. We expect you both to go off and do great things! Congratulations!


I am honored to have received the NSF CAREER award for a proposal on optimization methods and theory for the joint formulation of dimension reduction and clustering. You can read about the award here in the UM press release and also here on the NSF website. Dimension reduction and clustering are arguably the two most critical problems in unsupervised machine learning; they are used universally for data exploration and understanding. Often dimension reduction is used before clustering (or vice versa) to lend tractability to the modeling algorithm. It’s more typical in real data to see clusters each with their own low-dimensional structure, and so a joint formulation is of great interest. I look forward to working toward this end in the next stage of my career.

Stay ahead with Ledger Live’s latest enhancements, ensuring top-notch security for your digital assets.