Prefix Encodings of the Natural Numbers

Improved Prefix Encodings of the Natural Numbers

Quentin F. Stout
Computer Science and Engineering, University of Michigan

Abstract: Suppose there is an infinite sequence of messages to be transmitted, but there are no a priori bounds on the lengths of the messages. Therefore a procedure must be devised to separate one from the next. Typically one adds a new symbol, such as comma, to separate the messages. This requires that the alphabet of the messages be expanded to include the comma, which expands each message by an amount proportional to its length.

One can do better, expanding the messages by a sublinear amount, by prefixing each message with a natural number giving the length of the message (in bits, bytes, or whatever unit is relevant). This then raises the question of efficient encodings of arbitrarily large natural numbers. Two classes of order-preserving prefix encodings of the natural numbers are introduced, where every member of each class is universal asymptotically optimal (see the paper for a definition of this). The asymptotically best encodings in these classes are determined and are found to improve on previous encodings. The results are related to channel capacity and unbounded searching.

Keywords: unbounded search, binary encoding, order-preserving prefix codes, septenary code, integers, natural numbers, long strings, information theory, coding theory

Complete paper. This was scanned in by IEEE. It appears in IEEE Transactions on Information Theory IT-26 (1980), pp. 607-609.

Related work: Here are my papers in discrete mathematics. One that addresses similar concerns is Searching and encoding for infinite ordered sets.