Balancing a Binary Search Tree

Tree Rebalancing in Optimal Time and Space

Quentin F. Stout
Computer Science and Engineering, University of Michigan

Bette L. Warren
Mathematics Department, Eastern Michigan University

Abstract: We give a simple algorithm which takes an arbitrary binary search tree and rebalances it to form an optimal binary search tree, i.e., one which has minimal height and a minimal number of nodes at the maximum depth. Any such tree has minimal internal path length, or, equivalently, it has minimal expected depth of a node, assuming that all nodes are equally likely. The algorithm actually produces a perfect tree that has an even stronger property, namely that at each node, the number of nodes in its left and right subtrees differ by at most one.

The algorithm uses time linear in the number of nodes and only a constant amount of space beyond that used to store the initial tree. It is thus optimal in its use of both time and space. Previous algorithms were suboptimal in at least one of these measures, or were not applicable to all binary search trees. When the nodes are stored in an array, a simple addition to the algorithm results in the nodes being stored in sorted order in the initial portion of the array, again using only linear time and constant space.

The algorithm consists of two operations:

tree_to_vine, which converts an arbitrary binary search tree into a vine, where the smallest item is the root, next smallest item is to its right, etc. Note that vines could be called ordered lists.
vine_to_tree, which converts a vine into a perfectly balanced tree.

The operations are based on repeated use of simple rotations. Each operation is quite efficient and of interest in itself.

It is easy to develop an implementation of the tree to vine conversion, and vice versa, using a stack or recursion (which is a veiled use of a stack). However, this requires additional space proportional to the height of the tree. Similar uses of stacks appear in inorder tree traversals, and the above operations can be used to eliminate such additional space, while still maintaining linear time, for some applications. For example, if there are trees representing subsets of some ordered set, one can find their union (merger) or intersection by first using tree_to_vine. An in-order traversal of the vine is a fast, simple, linear scan, which can easily be used to perform the appropriate operation. vine_to_tree then restores everything to a balanced tree.

Not only does this reduce the space requirements for the operation, but it may also be faster than most implementations of more standard approaches. Due to this speed, one might also decide to use vine_to_tree as part of a selection operation, where one is given a tree with some of the nodes having a field indicating that there are to be selected. The goal is to produce a tree of copies of the selected nodes. One can use a standard in-order traversal of the original tree (or use tree_to_vine first), creating a vine of the copied elements as the tree is traversed. At the end, vine_to_tree converts the selected nodes into a balanced tree.

While not included in this paper, it is also straightforward to modify the algorithm so that the result is a red-black, AVL, or weight-balanced tree (and many other variations as well). I.e., the node information needed for these trees is inserted during the vine_to_tree phase, still requiring only linear time. The tree can also be threaded, retaining an additional link that points to the node containing the next key in order.

Keywords: global binary search tree rebalance, optimal balanced binary search tree, perfect balance, minimal height, tree traversal, rotation

Complete paper. This paper appears in Communications of the ACM 29 (1986), pp. 902-908. It was later scanned by the Association for Computing Machinery (ACM).

Note: why call them vines? List is a more common term, but lacks the botanical parallels. If one visually adds the external nodes to a tree, then they are similar to the leaves on a real tree (except for the fact that computer science trees are typically drawn upside down). For a standard list there is only 1 external node, at the end, but in our situation each node in the vine has an unused pointer. If one adds an external node at each pointer then there are leaves along the entire length, which looks much like a vine.