Promotion to Associate Professor

I am thrilled that yesterday the Regents of the University of Michigan promoted me to Associate Professor with tenure, effective September 1. It feels like yesterday that I arrived in my faculty office for the first time, and yet it also feels long ago. I appreciate so much the support of my colleagues and mentors. I also would not be here without my outstanding, curious, enthusiastic, and hard-working graduate students — they are incredible. Here is the letter that my dean sent to the provost to recommend me for promotion. I am looking forward to the next phase!

Congratulations Dejiao and David!

SPADA lab is so proud of Dr. Dejiao Zhang and Dr. David Hong. They both successfully defended their PhD dissertations this spring. Dejiao is going to Amazon Web Services next, and David is going to a postdoc at the University of Pennsylvania. We expect you both to go off and do great things! Congratulations!

NSF CAREER Award

I am honored to have received the NSF CAREER award for a proposal on optimization methods and theory for the joint formulation of dimension reduction and clustering. You can read about the award here in the UM press release and also here on the NSF website. Dimension reduction and clustering are arguably the two most critical problems in unsupervised machine learning; they are used universally for data exploration and understanding. Often dimension reduction is used before clustering (or vice versa) to lend tractability to the modeling algorithm. It’s more typical in real data to see clusters each with their own low-dimensional structure, and so a joint formulation is of great interest. I look forward to working toward this end in the next stage of my career.

Army Young Investigator

I am very excited that my project “Mathematics for Learning Nonlinear Generalizations of Subspace Models in High Dimensions” has won the Army Young Investigator award! Subspace models are widely used due to simplicity and ease of analysis. However, while these linear models are very powerful in many high-dimensional data contexts, they also often miss out on important nonlinearities in real data. This project aims to extend recent advances in signal processing to the single-index model and the nonlinear variety model. Read the department’s announcement here.

Postdoc Opportunity at the University of Michigan

to begin in spring 2019.

 

Please email Laura Balzano <girasole@umich.edu> with the subject “Joining the Balzano lab — postdoc 2019” if you are interested.

We are seeking a postdoc who is interested in applying machine learning techniques to real-time dynamic data analysis. While machine learning has advanced significantly over the last decade, its application to dynamic time-varying data is still in its infancy. This project will focus on three ML areas: online learning, stochastic gradient methods, and streaming PCA. We will work on theory to understand how the standard approaches behave when the data are time-varying, develop appropriate models for time-varying data, and develop novel approaches along with convergence theory. Our main applications focus will be power systems engineering and computer vision. In power systems, we will develop methodologies to infer the real-time behavior of aggregations of distributed energy resources from hierarchical, heterogeneous, and incomplete measurements of power system quantities. In computer vision, we will develop real-time algorithms for object tracking and activity recognition in video.

Optimally Weighted PCA for High-dimensional Heteroscedastic Data

Today I had the opportunity to speak about very recent results by my student David Hong (joint work also with Jeff Fessler) in analyzing asymptotic recovery guarantees for weighted PCA for high-dimensional heteroscedastic data. In the paper we recently posted online, we have asymptotic analysis (as both the number of samples and dimension of the problem grow to infinity, but converge to a fixed constant) of the recovery for weighted PCA components, amplitudes, and scores. Those recovery expressions allow us to find weights that give optimal recovery, and the weights turn out to be a very simple expression involving only the noise variance and the PCA amplitudes. To learn more, watch my talk here, and let us know if you have any questions!

AFOSR Young Investigator

I have great news that my AFOSR Young Investigator proposal was accepted for funding. My proposal was focused on time-varying low-rank factorization models, and various ways of solving a variety of related non-convex problem formulations.  Read more about it here.   I look forward to the contributions we will be able to make with the support of AFOSR.

 

Ensemble K-Subspaces

Yesterday I gave a talk on Subspace Clustering using Ensemble methods at the Simons Institute. See the video here!

This is work with John Lipor, David Hong, and Yan Shuo Tan. Our related paper has been just updated on the arxiv. Our key observation was that, while K-Subspaces (KSS) works poorly overall and depends heavily on initialization, it still seems to give partially good clustering information. We therefore use it as a “weak clusterer” and combine ensembles of KSS (EKSS) by averaging the co-association/affinity matrices. This works extremely well, both in simulation and on real data, and also in theory. We were able to show that EKSS gives correct clustering in a variety of common cases: e.g. for subspaces with bounded affinity, and with noisy data and missing data. Our theory generalizes theory of the Thresholded Subspace Clustering algorithm to show that any algorithm that produces an affinity matrix that is an approximation to a monotonic function of absolute inner products will give correct clustering. This general theory should be broadly applicable to many geometric approaches to subspace clsutering.

Improving K-Subspaces via Coherence Pursuit

John Lipor, Andrew Gitlin, Bioashuai Tao, and I have a new paper, “Improving K-Subspaces via Coherence Pursuit,” to be published in the Journal of Special Topics in Signal Processing issue “Robust Subspace Learning and Tracking: Theory, Algorithms, and Applications.” In it we present a new subspace clustering algorithm, Coherence Pursuit – K-Subspaces (CoP-KSS). Here is the code for CoP-KSS and for our figures. Our paper considers specifically the PCA step in K-Subspaces, where a best-fit subspace estimate is determined from a (possibly incorrect) clustering. When a given cluster has points from multiple low-rank subspaces, PCA is not a robust approach. We replace that step with Coherence Pursuit, a new algorithm for Robust PCA. We prove that Coherence Pursuit indeed can recover the “majority” subspace when data from other low-rank subspaces is contaminating the cluster. In this paper we also prove — to the best of our knowledge, for the first time — that the K-Subspaces problem is NP-hard, and indeed even NP-hard to approximate within any finite factor for large enough subspace rank.

 

Streaming PCA Review Article

The Proceedings of IEEE posted our review article today on Streaming PCA and Subspace Tracking with Missing Data. It was a great experience to work with Yuejie Chi and Yue Lu on this survey. You can also find a less pretty version on the arxiv.