Research

OVERVIEW - Hospitals today are collecting an immense amount of patient data (e.g., images, lab tests, vital sign measurements). Despite the fact that health data are messy and often incomplete, these data are useful and can help improve patient care. To this end, we have pioneered work in developing machine learning (ML) and artificial intelligence techniques (AI) inspired by problems in healthcare (e.g., predicting patient outcomes like infections). In collaboration with 30+ clinicians and domain experts, we have identified areas with potential for high clinical impact and methodological impact. Through extensive preliminary work dedicated to understanding the clinical problem and the dataset that will be used, we formalize problems and develop new approaches. Our methodological contributions cluster into the following technical thrusts, driven primarily by four questions related to "who," "why," "what," and "how".

Jump to List of Publications

METHODOLOGICAL CONTRIBUTIONS

Thrust 1: Times-series Analysis - We work on approaches for predicting outcomes from time-series data and time-series forecasting. Modeling health data using the trajectory of a patient instead of snapshots led to new challenges and pitfalls highlighted by my group [ AMIA’17 KDDM Best Paper], which continues to be used as a guide in setting up new prediction tasks. While much emphasis is often placed on the learning architecture/objective in ML, >90% of the work is preprocessing. To this end, we developed an open-source software package, “FIDDLE,” that incorporates best practices from the literature with the goal of streamlining the processing of time-series clinical data [JAMIA ‘20]. This tool has been used over 100 times since its initial publication. In time-series analysis, the output can also vary over time. To this end we have made significant contributions to survival analysis [AAAI ‘21a, AAAI ‘21b, ICDM ’24] and time-series forecasting [KDD ‘18, AAAI ‘23], winning challenges [KDH @ ECAI ‘20].

Thrust 2: Robust ML - Deep learning approaches while powerful are prone to latching on to spurious correlations or ‘shortcuts’ that hold in the training dataset but fail to generalize. E.g., if the sickest patients in a hospital are all located in a unit where a specific chest x-ray machine is used, a model could pick up on artifacts in the training data associated with the machine used to capture the chest x-ray, rather than clinically-relevant radiological findings. While the model has learned an accurate association in the data it is unlikely to generalize across institutions or even across time, within the same hospital. We have highlighted the dangers of shortcuts in health datasets [MLHC ‘20], and have worked on mitigating shortcuts during model training to increase model robustness [KDD‘18, NeurIPS ’22a, KDD ‘25]. Systematic under-testing of subpopulations can lead models to wrongly associate them with lower risk; we introduced disparate censorship to study differences in testing rates across patient groups as a source of bias [MLHC ‘22, PLOS ’24], and proposed a solution [ICML’24].

Thrust 3: Decision Making & Control - With the goal of improving the actionability of AI models in health, I have focused on tackling challenges in learning treatment policies using causal inference and offline reinforcement learning (RL). My work provides theoretical, as well as practical, foundations for human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near equivalent actions or treatments. Our foundational advancements resulted in a novel model-free algorithm for learning “set-valued policies” (SVPs), which returns a set of near-equivalent actions rather than a single optimal action [ICML ‘20]. Recognizing that many problems in healthcare result in exponential action spaces when treatments are considered in combination, we also developed an approach for sample-efficient offline RL with factored action spaces [NeurIPS ’22b] and a new semi-offline paradigm for model policy evaluation [NeurIPS ‘23]. We have worked to increase the rigor of offline RL in healthcare with our work in model selection for offline RL [MLHC ‘21a] that has been cited >100 times. This work has also led to contributions related to “decision-focused” learning [AISTATS ‘24, JAMIA ‘25].

Thrust 4: Human-AI collaboration - As we move towards integrating these tools into clinical workflows, we have started to examine how clinicians interacted with the output of an AI model and specifically how it influences decisions. We showed how an incorrect model can harm clinician accuracy and lead to poor treatment decisions, and that image-based explanations do nothing to mitigate that harm [JAMA ‘23]. This finding is particularly important given that explanations were emphasized by the White House blueprint for an AI Bill of Rights. We continue to study the impact of selective prediction, and how hiding potentially unreliable model predictions may reduce automation bias.

Preprints/Publications/Presentations

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016 - 2011

Multimedia

Women in Tech Shown | Scientific American | ACP Hospitalist Article | MIT Tech Review Article | SSAC16 | Invited Talk at Wellesley College: Big Data's Impact in Medicine, Finance, and Sports | SSAC13: To Crash or not to Crash | ESPN TrueHoop TV: Interview with Henry Abbott | ESPN TrueHoop: Commentary | Grantland Interview | NeurIPSNIPS 2012 Spotlight | NeurIPSNIPS Workshops 2011 Spotlight