SPADA lab is so proud of Dr. Dejiao Zhang and Dr. David Hong. They both successfully defended their PhD dissertations this spring. Dejiao is going to Amazon Web Services next, and David is going to a postdoc at the University of Pennsylvania. We expect you both to go off and do great things! Congratulations!
NSF CAREER Award
I am honored to have received the NSF CAREER award for a proposal on optimization methods and theory for the joint formulation of dimension reduction and clustering. You can read about the award here in the UM press release and also here on the NSF website. Dimension reduction and clustering are arguably the two most critical problems in unsupervised machine learning; they are used universally for data exploration and understanding. Often dimension reduction is used before clustering (or vice versa) to lend tractability to the modeling algorithm. It’s more typical in real data to see clusters each with their own low-dimensional structure, and so a joint formulation is of great interest. I look forward to working toward this end in the next stage of my career.
Army Young Investigator
I am very excited that my project “Mathematics for Learning Nonlinear Generalizations of Subspace Models in High Dimensions” has won the Army Young Investigator award! Subspace models are widely used due to simplicity and ease of analysis. However, while these linear models are very powerful in many high-dimensional data contexts, they also often miss out on important nonlinearities in real data. This project aims to extend recent advances in signal processing to the single-index model and the nonlinear variety model. Read the department’s announcement here.
Postdoc Opportunity at the University of Michigan
Optimally Weighted PCA for High-dimensional Heteroscedastic Data
Today I had the opportunity to speak about very recent results by my student David Hong (joint work also with Jeff Fessler) in analyzing asymptotic recovery guarantees for weighted PCA for high-dimensional heteroscedastic data. In the paper we recently posted online, we have asymptotic analysis (as both the number of samples and dimension of the problem grow to infinity, but converge to a fixed constant) of the recovery for weighted PCA components, amplitudes, and scores. Those recovery expressions allow us to find weights that give optimal recovery, and the weights turn out to be a very simple expression involving only the noise variance and the PCA amplitudes. To learn more, watch my talk here, and let us know if you have any questions!
The problem with this method is that after the detailed plan for relocating your body to Dvortsovaya Street has been drawn up, we have nothing more to talk about and all the norms of public ethics demand that we part ways and part ways for good. So the wisest thing to do aff app is to figure out in which direction the beauty is pointing her feet, and try to tail her. Since you are on the road, let her take you to the second right turn towards the Georgian embassy.
AFOSR Young Investigator
I have great news that my AFOSR Young Investigator proposal was accepted for funding. My proposal was focused on time-varying low-rank factorization models, and various ways of solving a variety of related non-convex problem formulations. Read more about it here. I look forward to the contributions we will be able to make with the support of AFOSR.
Ensemble K-Subspaces
Yesterday I gave a talk on Subspace Clustering using Ensemble methods at the Simons Institute. See the video here!
This is work with John Lipor, David Hong, and Yan Shuo Tan. Our related paper has been just updated on the arxiv. Our key observation was that, while K-Subspaces (KSS) works poorly overall and depends heavily on initialization, it still seems to give partially good clustering information. We therefore use it as a “weak clusterer” and combine ensembles of KSS (EKSS) by averaging the co-association/affinity matrices. This works extremely well, both in simulation and on real data, and also in theory. We were able to show that EKSS gives correct clustering in a variety of common cases: e.g. for subspaces with bounded affinity, and with noisy data and missing data. Our theory generalizes theory of the Thresholded Subspace Clustering algorithm to show that any algorithm that produces an affinity matrix that is an approximation to a monotonic function of absolute inner products will give correct clustering. This general theory should be broadly applicable to many geometric approaches to subspace clsutering.
Improving K-Subspaces via Coherence Pursuit
John Lipor, Andrew Gitlin, Bioashuai Tao, and I have a new paper, “Improving K-Subspaces via Coherence Pursuit,” to be published in the Journal of Special Topics in Signal Processing issue “Robust Subspace Learning and Tracking: Theory, Algorithms, and Applications.” In it we present a new subspace clustering algorithm, Coherence Pursuit – K-Subspaces (CoP-KSS). Here is the code for CoP-KSS and for our figures. Our paper considers specifically the PCA step in K-Subspaces, where a best-fit subspace estimate is determined from a (possibly incorrect) clustering. When a given cluster has points from multiple low-rank subspaces, PCA is not a robust approach. We replace that step with Coherence Pursuit, a new algorithm for Robust PCA. We prove that Coherence Pursuit indeed can recover the “majority” subspace when data from other low-rank subspaces is contaminating the cluster. In this paper we also prove — to the best of our knowledge, for the first time — that the K-Subspaces problem is NP-hard, and indeed even NP-hard to approximate within any finite factor for large enough subspace rank.
The origins of poker are also not completely clear, because, like many other games of chance, poker, most likely, also developed over several centuries, taking shape from different card games. Some argue that poker-like gambling was invented in the 17th century in Persia, while others say that the famous game of today was inspired by the French game Poque. The popularity of this game grew rather slowly until the 70s. of the last century, no world poker tournaments were held in Las Vegas. But the greatest recognition of this game was provided by the opportunity to gamble on the Internet when online poker appeared.
Streaming PCA Review Article
The Proceedings of IEEE posted our review article today on Streaming PCA and Subspace Tracking with Missing Data. It was a great experience to work with Yuejie Chi and Yue Lu on this survey. You can also find a less pretty version on the arxiv.
New paper in Journal of Multivariate Analysis
Congratulations to my student David Hong (and his co-advisor Jeff Fessler) for our published article in the Journal of Multivariate Analysis, titled “Asymptotic performance of PCA for high-dimensional heteroscedastic data.” Heteroscedastic data, where different data points are of differing quality (precisely, have different noise variance), are common in so many interesting big data problems. Sensor network data, medical imaging using historical data, and astronomical imaging are just a few examples. PCA is known to be the maximum likelihood estimate for data with additive Gaussian noise of a single variance across all the data points. This work investigates the performance of PCA when that homoscedastic noise assumption is violated. We give precise predictions for the recovery of subspaces and singular values in a spiked/planted model, and show that vanilla PCA (perhaps unsurprisingly) has suboptimal subspace recovery when the data are heteroscedastic.