EECS 570 - Parallel Computer Architecture

Paper Selection Algorithm:

  1. Browse through all the papers.
  2. Pick your favorite paper from four subtopics (e.g., Case Studies, Shared Memory Optimizations, etc.)
  3. E-mail these four paper picks to Prof. Austin, he will select one for you to present later in the semester.

Reading List

Case Studies

Alv90    The Tera Computer System, Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield and Burton Smith. ICS 1990
Sco96   Synchronization and Communication in the T3E Multiprocessor, Steven L. Scott. ASPLOS VII, 1996
Gha00   Architecture and Design of AlphaServer GS320, Kourosh Gharachorloo, Madhu Sharma, Simon Steely, and Stephen Von Doren. ASPLOS IX, 2000
Bar00   Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing, Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese. ISCA 27, 2000
Kal04 IBM Power5 Chip: A Dual-Core Multithreaded Processor. Ronald N. Kalla, Balaram Sinharoy, and Joel M. Tendler. IEEE Micro. 24(2), 2004
Marr02 Hyper-Threading Technology Architecture and Microarchitecture, Marr, D.; Binns, F.; Hill, D.; Hinton, G.; Koufaty, D.; Miller, J.; Upton, M. Intel Technology Journal, Vol. 6, No. 1, Feb. 2002
Kon04 Niagara: A 32-Way Multithreaded SPARC Processor. Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun. IEEE Micro. August 2004
Lei96 The Network Architecture of the Connection Machine CM-5. Leiserson et al. The Journal of Parallel and Distributed Computing, Volume 33, Number 2, March 15, 1996
And05 The XBOX 360 System Architecture, HotChips 2005, August 2005
 

Shared Memory Optimizations

Fal97 Reactive NUMA: a design for unifying S-COMA and CC-NUMA, Babak Falsafi and David A. Wood. ISCA 24, 1997
Mart02 Bandwidth Adaptive Snooping, Milo M.K. Martin, Daniel J, Sorin, Mark D. Hill, and David A. Wood. HPCA 8, 2002
Hag91 DDM - A Cache-Only Memory Architecture. Erik Hagersten, Anders Landin, Seif Haridi. IEEE Computer, 1991
Mart00 Timestamp Snooping: An Approach for Extending SMPs, Milo M. K. Martin, Daniel J. Sorin, Anastassia Ailamaki, Alaa R. Alameldeen, Ross M. Dickson, Carl J. Mauer, Kevin E. Moore, Manoj Plakal, Mark D. Hill, and David A. Wood. ASPLOS IX, 2000
Hei94 Integration of Message Passing and Shared Memory in the Stanford FLASH Multiprocessor, John Heinlein, Kourosh Gharachorloo, Scott Dresser, and Anoop Gupta. ASPLOS VI, 1994
Amz96 Treadmarks: Shared Memory Computing on Networks of Workstations. Amza et al. IEEE Computer, Vol. 29, No. 2, pp. 18-28, February 1996
Mart03 Token Coherence: A New Framework for Shared-Memory Multiprocessors. Martin et al. IEEE Micro, November-December 2003

 

Memory Consistency Models

Hil98 Multiprocessors Should Support Simple Memory Consistency Models, Mark D. Hill. IEEE Computer, August 1998
Adv96 Shared Memory Consistency Models: A Tutorial. Sarita Adve and Kourosh Gharachorloo. IEEE Computer, Dec. 1996
Cez07 BulkSC: Bulk Enforcement of Sequential Consistency, ISCA 2007
Mon08 DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently, ISCA 2008
Mei07 Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures

 

Synchronization Optimizations

Raj02 Transactional Lock-Free Execution of Lock-Based Programs, Ravi Rajwar and James R. Goodman. ASPLOS X, 2002
Raj01 Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. Ravi Rajwar and James R. Goodman. MICRO, December 2001
Luc08 Atom-Aid: Detecting and Surviving Atomicity Violations, ISCA 2008

 

Novel Programming Models

Ham04 Programming with Transactional Coherence and Consistency (TCC), Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, and Kunle Olukotun. ASPLOS XI, 2004
Cin00 Architectural support for scalable speculative parallelization in shared-memory multiprocessors, Marcelo Cintra, José F. Martínez and Josep Torrellas. ISCA 27, 2000
Dea05 MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. Google TR, 2005
Thi01 StreamIt: A Compiler for Streaming Applications. Thies et al. MIT/LCS Technical Memo LCS-TM-622, December, 2001
Xu03 A "flight data recorder" for enabling full-system multiprocessor deterministic replay. Xu et al. ISCA 2003
Zho07 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications, HPCA 2007
Shr07 An Integrated Hardware-Software Approach to Flexible Transactional Memory, ISCA 2007

 

Alternative Architectures

Swa03 WaveScalar. Steve Swanson, Ken Michelson, Andrew Schwerin and Mark Oskin. MICRO-36, December 2003
Osk98 Active Pages: A Computation Model for Intelligent Memory. Mark Oskin, Frederic T. Chong, Timothy Sherwood. ISCA-98
Sor03 Dynamic Verification of End-to-End Multiprocessor Invariants. Sorin et al. DSN'03
Sha07 Anton, a Special-Purpose Machine for Molecular Dynamics Simulation, ISCA 2007
Yeh07 ParallAX: An Architecture for Real-Time Physics, ISCA 2007
Loh08 3D-Stacked Memory Architectures for Multi-Core Processors, ISCA 2008

 

Performance Analysis

Cha94 Where is Time Spent in Message-Passing and Shared-Memory Programs?, Satish Chandra, James R. Larus, and Anne Rogers. ASPLOS VI, 1994
Bar98 Memory System Characterization of Commercial Workloads, Luiz Andre Barroso, Kourosh Gharachorloo, and Edward Bugnion. ISCA 25, 1998
Aga88 Agarwal et al, An evaluation of directory schemes for cache coherence, ISCA 1988