me Ali's Webpage research

My research interests are in operating system/hardware interaction specifically in the domain of high-performance networking and large scale bottleneck analysis to guide architectural improvements. I believe that there many opportunities in the interfaces between hardware and software. My background in computer architecture has familiarized me with what hardware changes are feasible and compatible with legacy code. Similarly my operating system experience guides me in providing interfaces that can be efficiently used with low overheads.

Below I describe two research projects I have worked on. The Simple Integrated Network Interface Controller which sought to improve network performance by replacing the traditional hardware and software interfaces between the NIC and CPU with a more flexible interface that is particularly suited to network processing. The Full-System Critical Path Analysis work seeks to provide tools and methodologies to analyze complex systems composed of multiple layers of hardware, software, and even multiple machines. The need for such tools was highlighted by working on the Simple Integrated NIC.

Finally, I have been an active developer of the M5 simulator. The M5 simulator in a modular open source architectural simulator that models systems in enough detail to boot operating systems such as Linux and Solaris on a variety of architectures.

Full-System Critical Path Analysis

Many critical workloads today, such as web-hosted services, are limited not by raw CPU processing power but by interactions between the CPU cores, the memory system, I/O devices such as disks and network interfaces, and the complex software (applications, middleware, operating systems, virtual machines) that ties all these components together. To improve the efficiency of these workloads and systems, designers and developers need tools to identify the bottlenecks so that they can address them. However, existing performance analysis tools such as software profilers cannot account for hardware bottlenecks or for situations where software overheads are hidden due to overlap with other operations.

I address this problem by developing an analysis methodology and tool set that identifies true bottlenecks in complex systems spanning multiple software and hardware layers executing concurrently across multiple CPU cores and dedicated hardware devices. My proposed approach uses critical-path analysis, which not only identifies bottlenecks but also quantifies their contribution and estimates the speedup obtainable if a particular set of bottlenecks is removed or reduced.

In my work to date, I have developed a technique to automatically extract dependence graphs suitable for critical-path analysis from systems composed of interacting state machines. Because hardware designs are often based on state machines, this model works well for hardware devices such as network interface controllers. Extracting the state machines embodied in software components is more difficult. To address this problem, I have also developed techniques to partially automate the state-machine analysis of software execution, along with a methodology for incrementally adding annotations to software only where needed to refine the state-machine breakdown. Using these techniques, I have successfully analyzed single streams of UDP and TCP communication between a pair of systems. In the future I hope to refine my techniques and make them applicable to systems on the order of web servers and database servers.

Simple Integrated Network Interface Controller

The Simple Integrated Network Interface Controller work was done with Nathan Binkert and sought to improve network performance by replacing the traditional hardware and software interfaces between a NIC and a CPU with a more flexible interface that is particularly suited to network processing.

We have shown that simple integration of a traditional NIC on a CPU die can improve the performance of the system by minimizing communication overheads and reducing misses on incoming packets. Additionally, this integration also allows for the redesign of the NIC to take advantage of the significantly reduced latency between the CPU and the NIC. Because of the reduced latency much of the complexity of a current high-performance NIC and be removed and much of the intelligence required for such a NIC can be provided by the CPU allowing the OS programmers significantly more flexibility in how the NIC is used. This simple NIC (which we termed SINIC), enables software optimizations that were not possible with traditional NICs including deferring of the payload copy on receive which can be exploited to implement a zero-copy receive optimization in the Linux kernel.

The M5 Simulator

The M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. The simulator is most useful to researchers in academia or industry looking for a free, open-source, full-system simulation environment for processor, system, or platform architecture studies.

M5 was developed to study network-oriented server workloads (including both projects described above), so it has features not commonly found in other simulators including: full-system simulation, detailed timing of I/O devices and DMA operations, and deterministic simulation of multiple machine.

M5 is heavily object oriented with all the major simulation structures (CPUs, buses, caches, etc.) being represented as objects all with well defined interfaces that allows different objects modules to be interchanged and unique object connections to be made. M5's configuration language allows flexible composition of objects and the simple description of complex memory hierarchies and replication to create multiple systems. The memory system in M5 is event driven including non-blocking caches, and split transaction buses.