Frequently Asked Questions about Ken
- What is Ken?
- "Ken" refers to both a simple rollback-recovery
protocol and its implementation as a C library. Both are
described in Yoo et al., "Composable Reliability for
Asynchronous Systems," 2012 USENIX Annual Technical
Conference. Ken (in both senses of the term) facilitates
reliable distributed application development. The name
and protocol both come
from Waterken, the
Java platform that first implemented the Ken protocol.
Waterken provides different programming abstractions
compared to the C implementation of Ken.
- How does Ken work?
- One way to think about Ken is that it starts with the
"actor" or "communicating event loop"
distributed programming paradigm and adds a twist: Each
iteration of each event loop is an ACID transaction. It
turns out that remarkably strong global correctness
guarantees follow from transactional event loops.
- What kinds of guarantees does Ken provide?
- Tolerated failures (crash-restart failures such as power
outages and OS kernel panics) cannot corrupt or destroy
local process state nor messages between Ken processes
in a distributed system. Messages are delivered exactly
once in FIFO order between each sender-receiver pair.
Ken furthermore guarantees "distributed
consistency" in the sense that a distributed system
can't end up in a causality-violating state wherein one
process remembers having received a message but, due to
crash-induced amnesia, no other process remembers having
sent it. More importantly, Ken masks failures in
in the sense that an external observer can't infer the
occurrence of tolerated failures within a Ken-based
distributed system by observing the system's collective
outputs. Finally, Ken's global correctness guarantees
compose when independently developed Ken-based
distributed systems are integrated, even when the
integration was not foreseen nor planned for.
- Such strong guarantees must require substantial programmer effort, no?
- On the contrary, Ken completely automates reliability.
The Ken application programmer takes no explicit steps to
ensure reliability other than writing atop the Ken
platform. For example, the programmer does not
specify the beginning and end of transactions. Apart from
programming in Ken fashion, achieving Ken's guarantees is
effortless and foolproof.
- What is it like to program a Ken application?
- You write a handler function that processes incoming
messages and inputs. The handler function can allocate
and de-allocate data on a persistent heap, and the handler
can send messages to other Ken processes and emit outputs.
There are a few other facilities provided but Ken is quite
simple. If you're familiar with existing event-driven
programming paradigms, e.g., GUI programming or AJAX, the
learning curve will be gentle.
- What kinds of applications is Ken for?
- Distributed applications that must prevent crash-restart
failures from causing incorrect behavior or
corrupting/destroying application state. Specific
programs that we have built atop Ken and
MaceKen
include distributed hash tables, a distributed graph
analysis program, and a distributed e-commerce
scenario. In our experience, programming in Ken is
convenient, and Ken is a versatile platform.
- But is there a specific "sweet spot" use case?
- Anecdotally, it appears that programmers confronted with
the requirement to protect both local process state and
inter-process messages from crashes frequently employ
message queuing middleware for the latter and a
relational database for the former, even when
relational DB features aren't fundamentally required.
The RDBMS is used simply to keep application state safe,
because writing homebrew checkpointing and recovery code
atop an ordinary file system is too tedious and
error-prone. In the RDBMS-plus-MQ pattern, it is the
programmer's responsibility to orchestrate the delicate
interplay between two sets of operations: transactions
that evolve the database from one consistent state to the
next, and operations that ensure reliable messages. The
slightest mistake (e.g., failing to record an outbound
message in the database) can leave the application
vulnerable to crash-induced distributed inconsistency.
Furthermore the individual reliability guarantees of
independently developed applications written in the
RDBMS-plus-MQ pattern are unlikely to compose without
additional programmer effort. Ken automates both process
state reliability and message reliability, masks failures
globally, and preserves global correctness under
composition.
- What about offbeat/unforeseen uses?
- Perhaps the most remarkable unforeseen use of Ken occurred
in July 2012, when a small group of developers integrated
Ken into a mature, full-featured Scheme interpreter. The
group reports that this took them only one day, and
they received zero assistance from the Ken team.
The result, "SchemeKen," is to the best of their
knowledge the first crash-resilient Scheme interpreter.
In August 2012 the Vrije Universiteit Brussel
released
SchemeKen as Open Source software.
- Is Ken an alternative to conventional databases?
- Sometimes. If your goal is to protect the integrity of
application state from crash-restart failures, and if you
want to update application state via ACID transactions,
then Ken might be a reasonable way to achieve these goals.
Ken is especially suitable when you want to store and
manipulate your data in arbitrary C/C++ data structures
rather than in relational format. A conventional database
is a better choice if you require features such as
relational algebra, schemas, and SQL.
- Is Ken an alternative to reliable-messaging middleware?
- Sometimes. Ken provides only a small subset of the
functionality of a full-featured message queuing
middleware package. Specifically, Ken provides only
reliable exactly-once message delivery in FIFO order
between each sender and receiver pair. If that's your
only message reliability requirement, Ken might be a
reasonable way to fulfill it, particularly if Ken's other
features address your other requirements.
- Are Ken transactions fast?
- That depends on the storage medium that provides data
durability. On an enterprise-class RAID system backed by
15K RPM spinning disks, Ken transactions take a few
milliseconds; ACID transactions in conventional databases
take roughly as long atop the same storage medium.
Flash-based SSDs would likely be faster, and emerging
non-volatile memory (NVRAM) would likely be faster still.
Regardless of the storage medium, data durability to
preserve data integrity entails a performance overhead.
Ken currently strives for reasonable performance in two
ways: Ken overlaps execution of the next iteration of
the event loop with committing the previous iteration's
checkpoint to durable storage; and Ken's checkpoints
are incremental.
- Tell me about the alternate "Go-Back-N" transport.
- Ken's default UDP-based transport retransmits messages
with a simple exponential backoff until an ACK is
received. This is fine for client-server interactions
and it's also fine if programmers use the
ken_ackd()
interface to avoid overloading a
recipient with too many messages. The MaceKen team at
Purdue University has contributed an alternative UDP-based
transport that implements the "Go-Back-N"
re-transmission policy. The alternative transport is
available as a replacement for the default
kenext.c
file. The Go-Back-N implementation
has been tested and used at Purdue but has not been
extensively tested in Palo Alto. It may offer superior
performance and/or convenience in some situations.
- Can you make Ken faster by relaxing its guarantees?
- Yes, but our intention for the foreseeable future is to
maintain Ken's correctness guarantees rather than weaken
them for the sake of performance.
- What about trading simplicity for speed?
- Maybe some day; probably not soon. Complexifying the
programming model or the implementation wouldn't help
reliability, which is a higher priority.
- Can I use Ken with C++, particularly STL?
- Yes, if you're careful. Ken is implemented in C89, but we
have successfully integrated Ken into the Mace distributed
systems toolkit from Purdue University, which is written
in C++ and uses STL extensively. It can be done.
As of August 2012 the distribution includes a sample Ken
application that uses C++ STL.
- What kinds of OSes can Ken run on?
- Ken is intended to be portable across POSIX-compliant
systems. It has been tested on HP-UX and on several
distributions of Linux.
- What about Mac OS X?
- Members of the Ken community have reported that Ken can be
made to work successfully on Mac OS X, with a bit of
effort. Mac OS X appears to have a few POSIX
non-compliance issues that complicate compilation, and
special compilation may be required to ensure successful
recovery. Furthermore the default file system reportedly
doesn't support sparse files, which Ken requires. The
workaround suggestions we've heard include: change the
code to catch SIGBUS rather than SIGSEGV when memory pages
are dirtied; use fcntl(F_FULLSYNC) instead of fsync()
because the latter doesn't provide durability on Mac OS X;
and "run Ken on a separate sparse disk image"
(not yet tested) or reduce the size of Ken's state blob to
circumvent the lack of sparse file support (tested
successfully). Compiling with "gcc -Xlinker
-no_pie" reportedly fixes an issue with mmap() that
might be due to address space layout randomization. These
suggestions come from the Ken community and have not been
evaluated by the core Ken team because we lack Mac OS X
machines. We have received a patch that embodies several
of the above workarounds; write to us if you'd like to try
it. (A much better solution would be for Mac OS X to
support POSIX compliance, at least as an option, if it
doesn't already.)
- Which Open Source license covers Ken?
- BSD.
- Can you help me get started with Ken?
- The Ken team will try to help those who have tried to
help themselves, particularly if you're committed to
building something useful atop Ken. We will also try
to leverage support from the Ken user and developer
community. If you're interested in improving the Ken
infrastructure rather than developing applications
atop Ken, we can try to work together.