"pma", a persistent memory allocator
version "2022.07Jul.19.1658299753 (Avon 6)"
Copyright (C) 2022  Terence Kelly
Contact:  tpkelly @ { acm.org, cs.princeton.edu, eecs.umich.edu }

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.



As of July 2022 the authoritative "One True Version" of pma is at
http://web.eecs.umich.edu/~tpkelly/pma/     [optionally "https"]
Copies may appear elsewhere, e.g., GitHub (below), but the above Web
site is the main home and original source.  Check it for updates and
write to me if you'd like to be notified about updates via e-mail.
I welcome feedback of any kind.

See the "NEWS" file for a summary of changes since previous release.

The design of pma is described in an article in ACM _Queue_ magazine,
March/April 2022.  An early application of pma is in "persistent
memory gawk" (pm-gawk), described in a paper in NVMW 2022 and
coming soon in GNU AWK.  See References below.

Applications #include "pma.h" and compile & link with pma.c using a
recent version of a modern compiler.  See pma.h for interface notes
and pma.c for compilation notes.  On older systems you might need to
link with "-lm".  Requirements/assumptions include 64-bit machine
words (longs and pointers) and reasonable page sizes.  Applications
that use pma should be tested with assertions enabled and with
verbose pma diagnostics enabled.

The most novel aspect for newcomers to persistent memory programming
is the role of the "root pointer."  See my articles in ACM _Queue_
magazine v17n4 and v20n2 for an explanation.

Most of the test programs bundled with pma are primarily intended to
exercise pma functionality for my benefit; they're not optimized for
tutorial value.  However they illustrate basic usage and you might
learn a trick or two by studying them.  For example, test5 shows how
to use the root pointer; test6 shows how to make pma fall back on the
conventional ephemeral memory allocator (standard malloc); and test7
shows an easy way to create a persistent C++ STL container by sliding
a pma persistent heap beneath an STL <map>.  The tests also show how
to create an uninitialized backing file for a persistent heap using
the "truncate" command-line utility.  Some tests run under Valgrind.
Tests come in .c/.csh file pairs; the scripts run under the C Shell
(csh).  My test scripts might not be perfectly compatible with every
OS and compiler.  Edit as necessary.

Fall-back-to-standard-malloc mode is an unusual feature absent from
most other persistent memory allocators.  If you use it, please write
to me describing why and how.

Because pma coalesces (merges) free'd blocks, if all persistent
memory ever allocated is freed, and then pma_set_avail_mem clears
(i.e., zero-izes) de-allocated memory, the persistent heap is
restored to almost exactly its initial state.  This "reversibility"
property --- which most memory allocators do not have --- facilitates
debugging; see test4.

Another use for clearing free'd blocks is reclaiming unused storage
beneath the heap file.  After the process using the heap terminates,
the command line "fallocate --dig-holes" releases underlying storage
beneath freed-and-zero'd blocks, thereby re-sparsifying the heap
file.  The filefrag utility can show a file's underlying storage
resource footprint before & after re-sparsification.

Backing files containing pma persistent heaps are binary files
containing C structs.  These heaps are not portable across different
machine architectures; e-mailing a heap to a friend is a bad idea.
Portability across different compiler versions is similarly not
guaranteed.

It is an error to use pma_free to de-allocate memory allocated via
conventional malloc (unless pma is operating in fall-back-to-malloc
mode; see pma.h).  Beware that many standard functions return memory
allocated on the conventional heap; examples include strdup and
realpath.  Tools such as nm and ltrace can help to identify uses of
standard malloc, free, etc.

pma currently offers no analogue of posix_memalign nor of the
obsolete functions that it replaces.

pma's persistent memory behaves differently than ordinary ephemeral
memory when a process calls fork(); pma's heap lives in a MAP_SHARED
file-backed memory mapping.

A pma persistent heap is backed by the heap file in roughly the same
sense as conventional malloc'd memory is backed by swap.  Some OSes
may have unhelpful default habits regarding modified ("dirty")
memory.  For example, the OS may write dirty memory pages to the
backing file on durable media periodically and/or when the OS
believes that "too much" memory is dirty.  Eager writeback by the OS
can degrade performance noticeably, particularly for large persistent
heaps, and is never beneficial except by accident.  Fortunately some
OSes allow defaults to be over-ridden so that writeback is lazy
rather than eager.  On Linux see the discussion of dirty_* parameters
at https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html

Beware new types of lifetime bugs that persistent memory enables.
Pointers from persistent memory to any other kind of memory are
usually bugs; at the very least, they require thought and care.  For
example, a pointer from a struct allocated on the persistent heap to
statically allocated memory, to the call stack, or to conventionally
allocated memory is usually a bug, because the memory pointed to will
vanish after the process terminates but the pointer to it persists.
C++ may silently stash function pointers in dynamically allocated
objects; C code may do the same explicitly in C structs.  Such
pointers point from persistent memory to the text segment of an
executable, so the text segment ought not be mapped inconsistently.
Position-independent executables invite trouble; gcc's "-no-pie" flag
prevents such executables.  Where possible, applications follow the
prudent commonsense rule that pointers in the persistent heap should
point only to NULL or to locations in the same persistent heap.

pma_init maps a specified persistent heap into process memory at the
same address every time, which is necessary for compatibility with
standard conventional malloc.  Will address-space layout
randomization (ASLR) somehow spoil the fun?  In practice ASLR never
makes trouble, and won't unless ASLR implementations change rather
dramatically.  Linux allows disabling ASLR for a single process:

    $ setarch `uname -m` -R ./a.out

where a.out is the executable.  ASLR should not be disabled without
thoroughly understanding and weighing the implications for a
particular application, including security implications.

Tolerating failures such as crashes requires ensuring that a
consistent state of pma's persistent heap can be recovered.  Prudent
practice is to make safe backup copies of the heap file before and
after an application modifies the heap.  If the persistent heap must
be checkpointed *during* execution of an application that modifies
it, the simple mechanism used to crashproof the gdbm database
(_Queue_ v19n4) might be appropriate.  Crash-tolerance mechanisms
should be subjected to realistic tests.  For example, mechanisms
intended to tolerate power failures should survive repeated sudden
whole-system power interruption tests; see _Queue_ v18n2 for an
inexpensive apparatus that can automate such tests.

For thread safety, calls to pma_* interfaces can be protected with a
global mutex.  Beware, however, subtle traps at the intersection of
parallelism and persistence; see _Queue_ v17n4 and v20n2 for details.

If you redistribute pma with minor modifications, I request that you
change the version string in both pma.c and pma.h.  For example, if
you make changes for Project FOO, append " + FOO" to the version
string.  If you want to redistribute an extensively modified pma, it
might be best to change the name from "pma" to something else; let's
talk.



References:

Terence Kelly, "Persistent Memory Programming on Conventional
Hardware," ACM _Queue_ magazine Vol. 17 No. 4 (July/Aug 2019),
PDF:   https://dl.acm.org/doi/pdf/10.1145/3358955.3358957
HTML:  https://queue.acm.org/detail.cfm?id=3358957

Terence Kelly, "Is Persistent Memory Persistent?," ACM _Queue_
magazine Vol. 18 No. 2 (March/April 2020),
PDF:   https://dl.acm.org/doi/pdf/10.1145/3400899.3400902
HTML:  https://queue.acm.org/detail.cfm?id=3400902

Terence Kelly, "Crashproofing the Original NoSQL Key/Value Store,"
ACM _Queue_ magazine Vol. 19 No. 4 (July/Aug 2021),
PDF:   https://dl.acm.org/doi/pdf/10.1145/3487019.3487353
HTML:  https://queue.acm.org/detail.cfm?id=3487353

Terence Kelly, Zi Fan Tan, Jianan Li, and Haris Volos, "Persistent
Memory Allocation," ACM _Queue_ magazine, Vol. 20 No. 2 (March/April
2022).
PDF:   https://dl.acm.org/doi/pdf/10.1145/3534855
HTML:  https://queue.acm.org/detail.cfm?id=3534855

Zi Fan Tan, Jianan Li, Haris Volos, and Terence Kelly, "Persistent
Scripting," Non-Volatile Memory Workshop (NVMW) 2022.
http://nvmw.ucsd.edu/program/    [NVMW URLs are not stable, so
this one might change after the 2022 event is over]

Persistent gawk (pm-gawk) prototype (a fork, NOT official gawk):
https://github.com/ucy-coast/pmgawk
https://coast.cs.ucy.ac.cy/projects/pmgawk/

Official GNU AWK (gawk) distribution (as of mid-July 2022 a
persistence feature based on pma has been incorporated into the
master git branch and it is anticipated that this persistence feature
will be present in the next official gawk release):
http://ftp.gnu.org/gnu/gawk/
http://savannah.gnu.org/git/?group=gawk
http://savannah.gnu.org/projects/gawk/
https://www.gnu.org/software/gawk/
https://directory.fsf.org/wiki/Gawk
http://git.savannah.gnu.org/cgit/gawk.git

