New paper in Journal of Multivariate Analysis

Congratulations to my student David Hong (and his co-advisor Jeff Fessler) for our published article in the Journal of Multivariate Analysis, titled “Asymptotic performance of PCA for high-dimensional heteroscedastic data.” Heteroscedastic data, where different data points are of differing quality (precisely, have different noise variance), are common in so many interesting big data problems. Sensor network data, medical imaging using historical data, and astronomical imaging are just a few examples. PCA is known to be the maximum likelihood estimate for data with additive Gaussian noise of a single variance across all the data points. This work investigates the performance of PCA when that homoscedastic noise assumption is violated. We give precise predictions for the recovery of subspaces and singular values in a spiked/planted model, and show that vanilla PCA (perhaps unsurprisingly) has suboptimal subspace recovery when the data are heteroscedastic.