Talks

Keynote Addresses
  • Preparing for a Post-Moore's Law World, given at the 2015 IEEE Symposium on Microarchitecture (MICRO-2015), December 2015.
    Abstract: For decades, Moore's Law scaling has been the fuel that propelled the computing industry forward, by delivering performance, power and cost advantages with each new generation of silicon. Today, these scaling benefits are slowing to a crawl. If the computing industry wants to continue to make scalability the primary source of value in tomorrow's computing systems, we will have to quickly find new and productive ways to scale future systems. In this talk, I will highlight my work and the work of others that is rejuvenating scaling through the application of heterogeneous parallel designs. Leveraging these technologies to solve the scaling problem will be a significant challenge, as future scalability success will ultimately be less about "how" to do it and more about "how much" will it cost..
  • Bridging the Moore's Law Performance Gap with Innovation Scaling, given at the 2015 International Conference on Performance Engineering
    (ICPE-2015), February 2015.
    Abstract: The end of Dennard scaling and the tyranny of Ahmdal's law have created significant barriers to system scaling, leading to a gap between today's system performance and where Moore's law predicted it should be. I believe the solution to this problem is to scale innovation. Finding better solutions to improve system performance and efficiency, and doing this more quickly than previously possible could address the growing performance gap. In this talk, I will highlight a number of simple (and not so simple) ideas to address this challenge.
  • The Upside of the Reliability Downtrend, given at the 2010 Workshop on Resilient Architectures (WRA-2010),
    Atlanta, GA, December 2010.
    Abstract: As silicon process technology scales deeper into the nanometer regime, the increased occurrence of hardware faults (both transient and permanent) are threatening the reliability of future designs. While many see this as a worrying trend, forcing the addition of expensive fault tolerance mechanisms, it can also be an opportunity to rethink design in the presence of reliability mechanisms. Designs built on a highly resilient substrate can achieve significant benefits that reduce power, improve performance, and increase yield. The key to attaining these benefits, however, are ultra-low cost resiliency mechanisms. I will present two of these mechanisms (Razor and BulletProof) and a variety of value-added design optimizations that they enable.
  • On the Rules of Low Power Design (and How to Break Them), given at the International Symposium on Low Power Electronics and Design (ISLPED-2008), August 2008.
    Abstract: Energy and power constraints have emerged as one of the greatest lingering challenges to progress in the computing industry. In this talk, I will highlight some of the "rules" of low-power design and show how they bind the creativity and productivity of architects and designers. I believe the best way to deal with these rules is to disregard them, through innovative design solutions that abandon traditional design methodologies. Releasing oneself from these ties is not as hard as one might think. To support my case, I will highlight two rule-breaking design trends from my work and the work of others. The first trend combines low-power designs with resiliency mechanisms to craft highly introspective and efficient systems. The second trend embraces subthreshold voltage design, which holds great promise for highly energy efficient systems.
     
  • Why Tools Matter, given at the International Symposium on Performance Analysis of Software and Systems (ISPASS-2008), April 2008.
    Abstract: Capable and accessible infrastructure is an accelerant for good research, as it enables creative people to quickly and effectively explore new ideas. In this talk I will reflect on my experiences with the SimpleScalar tool set, and make a case for why more researchers should share their tools. Finally, I will speculate on the future of modeling infrastructure and suggest where budding infrastructure hackers might want to spend their efforts. (slides)
     
  • Building Buggy Chips -- That Work!, a Distinguished Lecture as part of the Top Gun lecture series 2000-2001, Charlottesville, VA, March 2001.
    Abstract: Building a high-performance microprocessor presents many reliability challenges. Designers must verify the correctness of large complex systems and construct implementations that work reliably in varied (and occasionally adverse) operating conditions. Failure to meet these challenges can result in serious repercussions, ranging from disgruntled users to financial damage to loss of life. In this talk, I will describe a novel design strategy, called "dynamic verification", that works to reduce the burden of correctness on complex systems. The approach creates minimally correct systems that are capable of tolerating most permanent (e.g., design errors) and transient (e.g., particle strikes) faults. I will also detail ongoing work that suggests dynamic verification could render other valuable benefits such as reduced time-to-market, decreased power consumption, and improved performance. (slides)

Selected Panel Talks

Selected Invited Talks

  • Chip, Heal Thyself, given at Chalmers University, Gothenburg, Sweden, October 2007.
    Abstract: As silicon technologies move into the nanometer regime, transistor reliability is expected to wane as devices become subject to extreme process variation, particle-induced transient errors, and transistor wear-out. Unless these challenges are addressed, computer vendors can expect low yields and short mean times to failure. In this talk, I will detail the challenges of designing complex computing systems in the presence of transient and permanent faults. In addition, I will detail the "BulletProof" pipeline, the first ultra low-cost mechanism to protect a microprocessor pipeline and on-chip memory system from silicon defects. To achieve this goal we combine area-frugal on-line testing techniques and system-level checkpointing to provide the same guarantees of reliability found in traditional solutions, but at much lower cost. (slides)
     
  • Sidestepping Performance Bottlenecks with Better Than Worst-Case Design, given at Microsoft Corporation, November 2006.
    Abstract: This talk introduces the audience to a novel design methodology that addresses the correctness and reliability challenges of deep-submicron silicon. The focus is on a new design strategy, called Better Than Worst-Case design, which couples complex design components with simple robust checker mechanisms. By delegating the responsibility of correctness and reliability to the checker, it becomes possible to quickly build designs that are provably correct and that effectively address performance and reliability concerns. Two exemplary better than worst-case designs will presented: DIVA and Razor. DIVA is a functional checker for a complex microprocessor, capable of correcting faults caused by transients, silicon defects and design errors. Razor is a low-power pipeline that utilizes circuit-level timing error correction to eliminate voltage margins and minimize energy demands. In addition, a complementary design technique, called typical-case optimization (TCO), is introduced as a way to take advantage of the relaxed design constraints on fully checked designs.  (slides)  A video of this presentation is also available (here).
     
  • Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, given at Intel Corporation, Santa Clara, CA, September 2003.
    Abstract: With increasing clock frequencies and silicon integration, power aware computing has become a critical concern in the design of embedded > processors and systems-on-chip. One of the more effective and widely used methods for power-aware computing is dynamic voltage scaling (DVS). In order to obtain the maximum power savings from DVS, it is essential to scale the supply voltage as low as possible while ensuring correct operation of the processor. The critical voltage is chosen such that under a worst-case scenario of process and environmental variations, the processor always operates correctly. However, this approach leads to a very conservative supply voltage since such a worst-case combination of different variabilities will be very rare. In this talk, I detail a new approach to DVS, called Razor, based on dynamic detection and correction of circuit timing errors. The key idea of Razor is to tune the supply voltage by monitoring the error rate during circuit operation, thereby eliminating the need for voltage margins and exploiting the data dependence of circuit delay. A Razor flip-flop is introduced that double-samples pipeline stage values, once with a fast clock and again with a time-borrowing delayed clock. A metastability tolerant comparator then validates latch values sampled with the fast clock. In the event of a timing error, a modified pipeline mispeculation recovery mechanism restores correct program state. A prototype Razor processor (taped-out in August 2003) will be described, along with early simulation based results. (slides)
     
  • CryptoManiac: Application Specific Architectures for Cryptography, University of Washington, Seattle, WA, March 2001.
    Abstract: The growth of the internet as a primary vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. This trend will be further underscored with the widespread adoption secure protocols such as secure IP (IPSEC) and virtual private networks (VPN). In this talk, I will introduce the CryptoManiac processor, a fast, flexible and scalable co-processor for cryptographic processing workloads. Our design is extremely efficient, I will present analyses of a 0.25um physical design that runs the standard Rijndael cipher algorithm 3.8 times faster than an 600 MHz Alpha 21264 processor using an implementation that is 1/100th the size in the same technology. I will also demonstrate that its performance rivals that of a state-of-the-art dedicated hardware implementations of the 3DES (triple DES) algorithm, while retaining the flexibility to support multiple cipher algorithms even at the same time. Finally, I will define a scalable system architecture that combines CryptoManiac processing elements to exploit inter-session and inter-packet parallelism available in many communication applications. Using I/O traces and detailed timing simulation, we show that these scalable configurations can effectively service very high throughput applications including secure web and disk I/O processing. (slides)

Tutorials Given