14th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2010.
Deterministic replay systems record and reproduce the execution of a
hardware or software system. While it is well known how to replay
uniprocessor systems, replaying shared memory multiprocessor systems
at low overhead on commodity hardware is still an open problem. This
paper presents Respec, a new way to support deterministic replay of
shared memory multithreaded programs on commodity multiprocessor
hardware. Respec targets online replay in which the recorded and
replayed processes execute concurrently.
Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of data-race-free execution intervals and still ensures correct replay for execution intervals that have data races.
We modified the Linux kernel to implement our techniques. Our software system adds on average about 18% overhead to the execution time for recording and replaying programs with two threads and 55% overhead for programs with four threads.