Laura Balzano June 21, 2018

Group OWL Regularization for Deep Nets

My student Dejiao Zhang’s code for our paper Learning to Share: Simultaneous Parameter Tying and Sparsification in Deep Nets can be found at this link. We demonstrated that regularizing the weights in a deep network using the Group OWL norm allows for simultaneous enforcement of sparsity (meaning unimportant weights are eliminated) and parameter tying (meaning co-adapted or highly correlated weights are tied together). This is an exciting technique for learning compressed deep net architectures from data.

Interestingly, the first casinos appeared in the 17th century in Venice, Italy, and at first they were not associated with gambling. At the beginning of their existence, casinos were used as public halls for music and dancing, but there they also gambled. The first famous European gambling house, which, incidentally https://casino-pinups.com/, was not called a “casino”, although it did fit the modern definition of a casino, was Ridotto, which opened in Venice in 1638 to ensure control over gambling during the carnival. created throughout continental Europe in the 19th century, while more informal fashion was in vogue in the United States

Filed Under: Code

Analyzing Out-of-Distribution In-Context Learning

May 29, 2025 By Laura Balzano

We posted a new paper on arxiv presenting analysis on the capabilities of attention for in-context learning. There are many perspectives out there on whether it’s possible to do in-context learning out-of-distribution: some papers show it’s possible, and others do not, mostly with empirical evidence. We provide some theoretical results in a specific setting, using linear attention to solve linear regression. We show a negative result that when the model is trained on a single subspace, the risk on out-of-distribution subspaces is lower bounded and cannot be driven to zero. Then we show that when the model is instead trained on a union-of-subspaces, the risk can be driven to zero on any test point in the span of the trained subspaces – even ones that have zero probability in the training set. We are hopeful that this perspective can help researchers improve the training process to promote out-of-distribution generalization.

Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu. “Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective.” https://arxiv.org/abs/2505.14808.

SDP Relaxation paper in SIMAX

April 19, 2025 By Laura Balzano

I’m excited that our paper “A Semidefinite Relaxation for Sums of Heterogeneous Quadratic Forms on the Stiefel Manifold” has been published in the SIAM Journal on Matrix Analysis and Applications. https://doi.org/10.1137/23M1545136. We were inspired to work on this problem after it popped up inside the heteroscedastic PCA problem. It’s a fascinating, simple, general problem with connections to PCA, joint diagonalization, and low-rank semidefinite programs. Applying the standard Schur relaxation to this problem gives a trivial (and incorrect) solution, but one minor change makes the relaxation powerful and even tight in many instances. You can find the code for experiments here.

Sarah Goddard Power Award

February 11, 2025 By Laura Balzano

I am honored to have received the Sarah Goddard Power Award, an award given to those who contribute to the advancement of women in scholarship and academic leadership. Right now this work is critically important in my field, as technology in machine learning, artificial intelligence, and computing changes our world on a daily basis. Technology is often thought of as an objective pursuit, where the goals are clear and well-defined, and only those who are “math geniuses” can make a contribution. This couldn’t be further from the truth – we are constantly defining the goals and values of our technology, and diverse voices are key to creating technology that lifts us up as a whole society.