Ken: A Platform for Fault-Tolerant Distributed Computing

Description
Ken is a lightweight C implementation of a rollback-recovery protocol that provides crash-restart resilience to distributed applications. Ken unifies and automates reliability for both application data "at rest" (local process state) and data "in motion" (messages in a distributed system). Ken ensures that crash-restart failures (power failures, kernel panics, process crashes) can't corrupt or destroy data, and Ken guarantees that messages are reliably delivered and processed by recipients in process-pairwise-FIFO order. Ken furthermore provides strong global correctness guarantees and prevents crash-restart failures and packet losses from causing a distributed system to emit incorrect outputs. Finally, Ken's strong guarantees compose effortlessly when independently developed Ken-based distributed systems are integrated.
Source distribution
Contributed extensions & enhancements
Follow-on Projects
Publications
The C implementation of Ken and the integration of Ken into the Mace distributed systems toolkit are described in Yoo et al., "Composable Reliability for Asynchronous Systems," [local copy] in the proceedings of the 2012 USENIX Annual Technical Conference. An early abstract description of the Ken rollback-recovery protocol and a characterization of its properties is available in HP Labs Tech Report 2010-155. See the USENIX ATC paper (or the source code) for a more up-to-date description of the implementation. Ken is abstracted from its implementation in Waterken, a Java platform by Tyler Close. The tech report contains additional detail on Ken's genealogy. Kernel support for Ken-style persistent heaps is described in Park et al., "Failure-atomic msync()", EuroSys 2013.
Frequently Asked Questions (FAQs)
Registration & Support
Users are not required to register in any way, but may "opt in" to receive notifications of changes to Ken. If you have questions about how to use Ken or about its implementation, please first seek answers in Ken-related publications, the FAQs, and the source code. If that fails you may write to the Ken team via Terence Kelly. The Ken team wants to encourage the development of an ecosystem of mutually supportive fault-tolerant distributed applications and services, and to the extent possible we will try to support efforts toward that goal.