International Symposium on Microarchitecture (MICRO), Dec 2010.
Parallel programming is hard, because it is impractical to test all
possible thread interleavings. One promising approach to improve a
multi-threaded program's reliability is to constrain a production
run's thread interleavings in such a way that untested interleavings
are avoided as much as possible. Such an approach would avoid
hard-to-test rare thread interleavings in production runs, and thereby
improve correctness. However, a key challenge in realizing this goal
is in determining thread interleaving constraints from the tested
correct interleavings, and enforcing them efficiently in production
In this paper, we propose a new method to determine thread interleaving constraints from the tested interleavings in the form of lifeguard transactions (LifeTxes). An untested code region initially is contained in a single LifeTx. As the code region is tested for more thread interleavings, its original LifeTx is automatically split into multiple smaller LifeTxes so that those tested interleavings are permitted in a production run. The LifeTx interleaving constraints can be enforced efficiently using hardware transactional memory support.
We show that 12 out of 15 real concurrency bugs in programs like Apache, MySQL and Mozilla could be avoided using the proposed approach. LifeTx can also help improve the testing process. Instead of blindly stress testing as many rare interleavings as possible, testers and tools can prioritize their efforts on exposing more frequent interleavings for code regions contained in the largest LifeTx.