My research is primarily in the field of machine learning and pattern recognition, and more broadly in statistical signal processing. I like to study complex patterns in various kinds of data, and make quantitative predictions and inferences about those patterns. Machine learning is a cross-disciplinary fields that intersects electrical engineering (signal processing), computer science (artificial intelligence), and statistics (multivariate data analysis), and has been applied to a vast array of domains including biomedicine, economics, homeland security, astronomy, and many, many more.
Machine learning (ML) is about learning complex phenomena from experience or examples. Fundamental problems in ML include classification (given examples of two or more kinds of patterns, learn to correctly label new patterns); regression (given inputs and outputs of a function, learn the function); and clustering (given unlabeled examples of patterns, automatically group them into homogeneous clusters). Challenges in ML include heavy noise, high complexity, high dimensionally, extreme (small or large) sample size, partially observed data, drifting data characteristics, etc. You can check out my course on machine learning to get a sense of some of the different problems and methods in ML.
I am primarily interested in developing new algorithms and proving performance guarantees for new and existing algorithms. From a mathematical perspective, my work relies heavily on probability, linear algebra, and analysis, but I also encourage my students to develop mathematical maturity beyond those areas. You can look at my publications to see some of the specific problems I have worked on, including classification, novelty detection, active learning, transfer learning, and density estimation. My more recent publications give the best sense of my current interests.
Applications are great for inspiring new problems and validating new algorithms. Many applications also have the potential to impact society in a positive way. Here are some of the applications that are driving my research.
Goal: Classify gamma ray and neutron pulses arising from an organic scintillation detector. Application: Correctly labelled neutrons can be used to identify illicit nuclear material through analysis of their energy distribution. Challenge: Training data are contiminated because of background radiation when training measurements are gathered.
Collaborators: Sara Pozzi and Marek Flaska, UM Dept. of Nuclear
Engineering and Radiological Sciences, and David Wentzloff,
EECS.
Graduate students: Tyler Sanderson, Gregory
Handy
Goal: Detect anomalies in spatio-temporal environmental data, such as carbon-dioxide flux maps. Challenge: No labeled examples of anomalies, unknown spatial and temporal extent. More to come.
Collaborators: Long Nguyen, UM Dept. of Statistics, and Anna Michilak
and Vineet Yadav, Stanford.
Graduate student: Benjamin Schwartz
Support: NSF
Flow cytometry is a kind of high-throughput biological assay that is
capable of quantifying physical and chemical properties of individual
cells, such as their size, granularity, and binding to certain
antibodies. It is used by pathologists to diagnose and classify a
variety of blood-related disorders, including leukemia and lymphoma. We
are developing methods to analyze flow-cytometry as multi-dimensional
datasets, in contrast to current clinical practice which is limited to
two or three-dimensional interpretation. A current problem of interest
is automatically grouping cells according to cell type (lymphocyte,
granulocyte, etc.). We have developed a framework based on transfer
learning to account for the biological and technical variation in flow
data. Past problems include merging flow data sets with partially
overlapping obverved variables, and clusting while accounting for
censoring and truncation of measurements.The figures
below show a schematic of a flow cytometry system (left) and data for two
channels known as forward scatter and side scatter (right), which quantify
a cell's size and roughness, respectively.
|
|
Image registration is the task of finding a spatial transformation that brings two images into alignment. It is a critical step in medical image analysis, for example to align images of a patient taken under different modalities such at MRI and CT. Our objective is to apply machine learning to quantify the uncertainty in image registration algorithms. The figure below shows a reference image (left) and a holomogous image (right) that has been registered to the reference image. The selected point in the reference image corresponds, with high confidence, to a point in the region shown in the homologous image. In this example, the orientation of the confidence region reflects uncertainty due to the sliding motion of the diaphagm.
![]() | ![]() |
Collaborators: Charles Meyer, UM Dept. of Radiology, and Alfred Hero,
EECS.
Graduate student: Takanori Watanabe
Support: NIH
When first responders arrive on the scene of an accident or attack involving toxic chemicals, they need to rapidly identify the chemical. One way they do this is by observing which symptoms are expressed in victims. Our work centers on designing decision trees that guide first responders to test those symptoms that will most rapidly lead to chemical identification. We have developed algorithms and user interfaces that are designed to be both comprehensible to first responders, and robust to noise and uncertainty in the data and environment.
|
|
We have developed new methods to predict the onset of sepsis in patients who have had major cardiac or vascular surgery. Sepsis is a whole-body infection that often affects post-surgical patients. It has a high mortality rate in this population, but chances of survival increase significantly with advanced warning. We are developing novel techniques in signal processing and pattern recognition that aim to detect subtle patterns in patients' vital signs that are predictive of sepsis. One particular challenge is that vitals signs are sampled in an irregular fashion. The figure below shows an example of four vital signs for a patient. The unit if time is hours. The horizontal lines are standard clinical thresholds which tend to produce a high number of false alarms.
Collaborators: James Blum, UM Dept. of
Anesthesiology
Graduate student: JooSeuk Kim
Support: NSF, MICHR
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.