Ph.D. student | Current CV
Office: CSE 4856
My research interests are primarily in parallel computer architecture and parallel programming models in regards to performance scaling. As the industry increases core counts every generation, we have to look at new programming interfaces and microarchitectures to increase performance that leverages program wide parallelism instead of relying on instruction level parallelism and frequency as we had in the past. This has led me to look at Transactional Memory and how it may help extract performance from future applications. I also have studied the state of leveraging thread-level parallelism on current multi-core desktop machines for desktop applications.
Proactive Transaction Scheduling
Transactional Memory is a promising alternative over traditional synchronization schemes for task-based parallel programs (ie: programs that use multiple threads of execution working on shared datastructures to accomplish a task, versus data-parallel programs which have multiple threads each working on a chunk of data independently). Traditional synchronization is done using locks or semaphores in many cases. These are opaque data types that must be assigned meaning by the programmer by establishing locking conventions and rules. These locking conventions can become complicated and extracting performance can take years of effort.
Transactional memory offers a solution by allowing programmers to encapsulate sections of code as critical sections and the underlying transaction system protects all the data implicitly by making the critical section either execute fully or not at all as seen by a remote thread. But transactional memory does have problems. It cannot handle non-transactional actions such as I/O, and it can suffer from contention when multiple critical sections try to update the same data and cause serious performance degradation. For transactional memory to be seen as a viable "easier" alternative, both have to be solved. My work currently focuses on dealing with the contention. I'm working on hardware and software techniques that track contention history and use this history to predict which critical sections should run concurrently and which should run serially dynamically at run time to eliminate serious performance degradation due contention and in the end increase performance.
For more information on the basics of transactional memory I suggest reading some papers listed here.
Thread Level Parallelism on the Desktop
In 2000, single core architectures that pushed the boundaries of frequency scaling and instruction level parallelism were the norm for the desktop space. In 2005 this scaling came to an abrupt end and multi-core processors were pushed as the future for desktop/laptop machines. As technology has continued to scale and allow more transistors per square millimeter, core counts have continued to increase. This has led to almost all desktop/laptop machines sold to have multiple processors. A multiprocessor machine in 2000 was a niche product. This prompted a revisit of a study done in 2000 that investigated whether multi-processor machines made a performance impact for desktop workloads. That study concluded a minor benefit was gained from two processors, but during the time period it was published, it was concluded a faster single core processor would likely be released for cheaper that could match a dual processor system. Taking the study done in 2000, and redoing many of the experiments on a modern 8-processor desktop my co-authors and I found that most software still behaves primarily as single-threaded, but that all programs tested were in fact heavily threaded. This has prompted the conclusion that continuing to push larger multi-cores in this space may not be the best solution beyond a handful (2-4) cores. To facilitate duplication of our tests and findings in this investigative study, we provide descriptions of the tests performed and the relevant input sets here.
Refereed Conference, Workshop and Journal Publications:
Geoffrey Blake, Ronald G. Dreslinski, Trevor Mudge, "Bloom Filter Guided Transaction Scheduling". The 17th IEEE International Symposium on High Performance Computer Architecture. February 2011. [pdf]
Geoffrey Blake, Ronald G. Dreslinski, Trevor Mudge and Krisztian Flautner,"Evolution of Thread-Level Parallelism in Desktop Applications". The 37th Annual International Symposium on Computer Architecture. June 2010 [pdf]
Geoffrey Blake, Ronald G. Dreslinski, and Trevor Mudge,"Proactive Transaction Scheduling for Contention Management". The 42nd Annual IEEE/ACM International Symposium on Microarchitecture. December 2009 [pdf]
Geoffrey Blake and Trevor Mudge,"Duplicating and Verifying LogTM with OS Support in the M5 Simulator". Workshop on Duplicating, Deconstructing, and Debunking. June 2007 [pdf]
Xinju Li, Jacob Barhak, Igor Guskov and Geoffrey Blake,"Automatic Registration for Inspection of Complex Shapes". Virtual and Physical Prototyping. June 2007
Geoffrey Blake, Ronald G. Dreslinski, and Trevor Mudge,"A Survey of Multicore Architectures". IEEE Signal Processing Magazine: Special Issue on Signal Processing on Platforms with Multiple Cores: Part 1 - Overview and Methodology. November 2009 [pdf]
Geoffrey Blake, Trevor Mudge, Stuart Biles, Nathan Chong, Emre Ozer and Ronald G. Dreslinski,"Contention Management for a Hardware Transactional Memory". US Patent Pending #20090138890. Filed November 2008
Geoffrey Blake, "A Hardware/Software Approach for Alleviating Scalability Bottlenecks in Transactional Memory Applications". The University of Michigan. March 2011 [pdf]
Simulator packages and files:
For those interested in doing experiments on M5 with TM support, I have made my version of the simulator available here as a tarball mercurial repository. It is based on M5-2.0b4, and is not kept up to date with the main M5 repository due to the speed in which M5 changes. Therefore the code is provided as is. Also provided is a small software library for linear randomized backoff contention manager, a version of the STAMP benchmarks modified to work with M5, and patches to apply on top of the Linux 2.6.18 kernel patched for use with M5(refer to the M5 documentation on how to compile a kernel for M5).
Update: I have recently completed my PhD and am now releasing the full version of my M5 repository with versions of the transaction schedulers used in my publications. This code is as is.