I am an Associate Professor in the Computer Science & Engineering
department at University of Michigan, where I lead the Order Lab.
Prior to joining U-M, I was an Assistant Professor at Johns Hopkins CS department from 2017 to 2022.
I have broad research interests in computer systems including OS and
distributed systems. I am particularly interested in designing principled
techniques to enable reliable, efficient, and defensible systems from large
data centers to small mobile devices.
My lab has openings for postdocs, graduate and undergraduate student
interns. I’m looking for students who are self-motivated and have strong
interests in systems building and research. Prospective students please read this page.
News
Dec. 2024
ADR is accepted to NSDI '25. Congrats Ruiming, Yunchi, Yuxuan!
Aug. 2024
Anduril is accepted to SOSP '24. Congrats Tony, Haoze!
Jun. 2024
Yigong will join Boston University as an Assistant Professor.
Dec. 2023
Dec. 2023
Legolas is accepted to NSDI '24. Congrats Haoze, Tony!
Oct. 2023
Chang received the honorable mention for the Dennis Ritchie doctoral dissertation award!
Jul. 2023
pBox is accepted to SOSP '23. Congrats Yigong, Gongqi!
Jun. 2023
Yigong passed his PhD defense and will join University of Washington CSE as a postdoc!
May 2023
Chang passed his PhD defense and will join University of Virginia CS as an Assistant Professor!
Jan. 2023
vProf is accepted to EuroSys '23. Congrats Lingmei!
Sep. 2022
Gave a talk at
Strange Loop on distributed systems runtime checking
Mar. 2022
Orbit is accepted to OSDI '22. Congrats Yuzhuo!
Mar. 2022
Oathkeeper is accepted to OSDI '22. Congrats Chang, Yuzhuo!
Mar. 2022
RESIN is accepted to OSDI '22. Congrats Chang!
Dec. 2021
Awarded an NSF SMALL grant on distributed system fault injection
Dec. 2021
Gave a keynote talk in
HotDC 2021
Aug. 2021
Received a
Facebook Research Award on performance diagnosis.
Jul. 2021
Argus received the best paper award at ATC '21!
Apr. 2021
Argus is accepted to appear at USENIX ATC '21. Congrats Lingmei!
Mar. 2021
Arthas (
paper) is accepted to appear at EuroSys '21. Congrats Brian!
Research
A major focus of my recent research is to push for higher availability and
observability of next-generation cloud systems. This includes a series of
projects in multiple thrusts:
- Understanding of failures beyond fail-stop model
- Gray failure: We advocate the importance of the gray failure problem
in cloud systems and discuss its differential observability traits.
- Partial failure: We study and analyze real-world
partial failures in popular distributed systems.
- Principled detection and localization of complex failures
- Panorama: We design a solution to capture and enhance
inherent observability in cloud systems for the detection of gray failures.
- Watchdog: We propose the intrinsic watchdog abstraction
for comprehensive runtime checking in system software.
- OmegaGen: We design a program analysis and
instrumentation tool to generate custom watchdogs to localize partial failures. (Best Paper Award)
- Data-driven approach to transform traditional reliability activities
- Narya: a holistic system to predict failures and adaptively mitigate failures through online experimentation.
- Gandalf: an analytics service for safe deployments in cloud.
- AIOps: a short position paper on the real-world challenges and research
opportunities on AIOps.
I also research on energy-efficient mobile systems (e.g., LeaseOS, DefDroid,
eDoctor) and preventing system misconfigurations (e.g., Violet,
ConfValley).
Recent Select Publications
(Full publication list)
-
One-Size-Fits-None: Understanding and Enhancing Slow Fault Tolerance in Modern Distributed Systems
Ruiming Lu, Yunchi Lu, Yuxuan Jiang, Guangtao Xue, Peng Huang
NSDI 2025
-
Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection
Jia Pan*, Haoze Wu*, Tanakorn Leesatapornwongsa, Suman Nath, Peng Huang
SOSP 2024
[BibTeX]
[Software]
*: equal contribution
-
Efficient Exposure of Partial Failure Bugs in Distributed Systems with Inferred Abstract States
Haoze Wu, Jia Pan, Peng Huang
NSDI 2024
[BibTeX]
[Software]
-
Pushing Performance Isolation Boundaries into Application with pBox
Yigong Hu, Gongqi Huang, Peng Huang
SOSP 2023
[BibTeX]
[Slides]
[Software]
-
Effective Performance Issue Diagnosis with Value-Assisted Cost Profiling
Lingmei Weng, Yigong Hu, Peng Huang, Jason Nieh, Junfeng Yang
EuroSys 2023
[BibTeX]
[Slides]
[Software]
-
Operating System Support for Safe and Efficient Auxiliary Execution
Yuzhuo Jing, Peng Huang
OSDI 2022
[BibTeX]
[Slides]
[Software]
-
Demystifying and Checking Silent Semantic Violations in Large Distributed Systems
Chang Lou, Yuzhuo Jing, Peng Huang
OSDI 2022
[BibTeX]
[Slides]
[Software]
-
RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure
Chang Lou, Cong Chen, Peng Huang, Yingnong Dang, Si Qin, Xinsheng Yang, Xukun Li, Qingwei Lin, Murali Chintalapati
OSDI 2022
[BibTeX]
[Slides]
-
Argus: Debugging Performance Issues in Modern Desktop Applications with Annotated Causal Tracing [Best Paper Award]
Lingmei Weng, Peng Huang, Jason Nieh, Junfeng Yang
ATC 2021
[BibTeX]
[Slides]
[Software]
-
Automated Reasoning and Detection of Specious Configuration in Large Systems with Symbolic Execution
Yigong Hu, Gongqi Huang, Peng Huang
OSDI 2020
[BibTeX]
[Slides]
[Software]
[TechReport]
-
Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions
Sebastien Levy, Randolph Yao, Youjiang Wu, Yingnong Dang, Peng Huang, Zheng Mu, Pu Zhao, Tarun Ramani, Naga Govindaraju, Xukun Li, Qingwei Lin, Gil Lapid Shafriri, Murali Chintalapati
OSDI 2020
[BibTeX]
[TechReport]
-
Understanding, Detecting and Localizing Partial Failures in Large System Software [Best Paper Award]
Chang Lou, Peng Huang, Scott Smith
NSDI 2020
[BibTeX]
[Slides]
-
A Case for Lease-Based, Utilitarian Resource Management on Mobile Devices [Best Paper Award]
Yigong Hu, Suyi Liu, Peng Huang
ASPLOS 2019
[BibTeX]
[Slides]
[Software]
-
Capturing and Enhancing In Situ System Observability for Failure Detection
Peng Huang, Chuanxiong Guo, Jacob R. Lorch, Lidong Zhou, Yingnong Dang
OSDI 2018
[BibTeX]
[Slides]
[Software]
-
End-to-End Automated Exploit Generation for Validating the Security of Processor Designs [Best Paper Candidate]
Rui Zhang, Calvin Deutschbein, Peng Huang, Cynthia Sturton
MICRO 2018
[BibTeX]
-
Gray Failure: The Achilles’ Heel of Cloud-Scale Systems
Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R. Lorch, Yingnong Dang, Murali Chintalapati, Randolph Yao
HotOS 2017
[BibTeX]
[Slides]
Students
I am very fortunate to work with a wonderful group of students.
- Current PhD students
- Graduated PhD students
- Chang Lou → Assistant Professor at University of Virginia
- Yigong Hu → Postdoc at University of Washington → Assistant Professor at Boston University
- Brian Choi → Researcher at Applied Physics Lab
Professional Service
- Program Committee:
- 2024: SOSP ‘24, NSDI ‘25, ATC ‘24
- 2023: SOSP ‘23, OSDI ‘24, ASPLOS ‘24
- 2022: OSDI ‘23
- 2021: ASPLOS ‘22, HAOC ‘21 (co-chair), APSys ‘21
- 2020: OSDI ‘20, OSDI ‘21, NSDI ‘21, APSys ‘20, ICDCS ‘20
- 2019: SOSP ‘19, HotOS ‘19, APSys ‘19, ASPLOS ‘19 SRC, RTAS ‘20
- 2018: USENIX ATC ‘18
- 2017: USENIX ATC ‘17, SOSP ‘17 SRC, HotConNet ‘17
- 2016: MobiSys PhD forum
- Journal Reviewer: TPDS 2016, SCICO 2019, TOS 2020
- Shadow PC: EuroSys 2017
- Assistant for PC chair: ASPLOS 2016
Bio
I received my Ph.D. from UCSD, advised by
Prof. Yuanyuan Zhou. Before joining Hopkins,
I took one year off at MSR Redmond Systems Group
to gain exposure to real-world system challenges in a state-of-the-art cloud service, Microsoft
Azure. I received B.S. (Computer Science) and B.A. (Economics) from Peking University.
Note: Ryan is my English name. For legal documents and publications, Peng Huang is used.
Check out Phair, Patternful AI, and CircleCoder