Ang Chen
Associate Professor
Computer Science and Engineering
University of Michigan
Office: 4753 Beyster building
E-mail: chenang (at) umich (dot) edu
My work focuses on systems, networking, and security. I'm particularly drawn to problems that require a cross-disciplinary approach and produce a practical impact. Our current projects include the following:
Digital transformation:
A new project where we are building a computing stack to manage large infrastructures (e.g., datacenters, power grids, and water systems) and their "nexuses" for resilience.
See our vision statement to appear at CACM as an Op-Ed article.
Cloud management:
We believe cloud management is an increasingly important research challenge, especially in a declarative (i.e., Infrastructure-as-Code/IaC) manner; our vison paper argues for an AIOps-based approach to management. We've curated an IaC dataset and benchmark, and developed systems for generating IaC checks, cloud program lifting, and debugging IaC updates.
Runtime programmable networks:
The goal is to make end-to-end network infrastructure, vertically from the host kernels to the NICs, and horizontally extending across switches to the other end of the network,
runtime reprogrammable on-the-fly without packet loss, and with strong consistency guarantees. See our vision paper, joint talk,
the runtime programmable switch project which implements runtime reconfiguration for silicon switch ASICs, performance optimization of SmartNICs, a program synthesis tool that generates runtime update plans, our $3M NSF Large project. Also see NVIDIA's hightlight on these projects in a HotChips'23 talk.
Programmable in-network security:
Our vision is to transform a programmable network into a "programmable defense infrastructure," which supports security as naturally as it does routing. In this design, a switch not only forwards traffic,
but also applies to it a wide range of defenses. The network not only routes traffic end-to-end, but also swaps defenses along the paths in and out as needed to mitigate attacks.
Recent projects include:
Poise,
NetWarden,
Ripple,
P4wn,
Bedrock,
RDMI,
NetShuffle,
SpotProxy,
Older projects:
ML for systems software:
Systems software (e.g., OS kernels) needs to support different applications and multiplex different types of hardware platforms; no one-size-fits-all optimizations exist.
Neural networks are effective at generalizing to unseen scenarios, but their blackbox nature is a poor fit for low-level systems software, which must make safety-critical decisions.
In this project, we are pursuing two approaches: a) creating systems-level mechanisms that are constrained by symbolic logic while making them reconfigurable with learning-derived policies,
and b) applying learning to analyze systems-level code to identify optimization opportunities.
See examples at this paper
and the Clara project.
Causality in distributed systems:
Diagnosing problems in large systems has always been a challenging problem due to their complexity. Our project uses data provenance to track causal relationships
between system states and their changes. It further uses them to enable automated reasoning for fault diagnosis, repair, and prevention, e.g., using a Datalog-like logical model.
See the individual projects:
Spidermon,
CloudCanary,
Zeno,
DiffProv,
SPP,
MetaProv.
Infrastructure optimizations for data-intensive systems:
This project aims at a tighter vertical integration between data-intensive systems and the cloud infrastructure to improve their performance,
by whole-stack optimizations from the network layer to the OS, and to distributed frameworks and the applications themselves.
One overarching theme for these optimizations is to reconfigure or rearchitect various parts of the infrastructure for data-intensive workloads, as further
informed by modern hardware technologies available in the cloud.
Projects include:
DDCs,
GraphRex,
Contra,
Lightning,
TELEPORT,
RDC.