Red/Black Trees
From CS315
Written by Mark Sarli
Red-black trees are a type of balanced binary search tree that were invented/discovered by Rudolf Bayer in 1972. Binary search trees, a good alternative to a linked list, support most operations in O(log N) average time. In the worst case, however, an unbalanced tree can create lookup times of O(N) , where the tree has essentially deteriorated into one long linked list. This worst case may come about from simple inputs, such as inputting values in sorted order into the tree. The red-black tree is a self-balancing binary search tree designed to eliminate that worst case (other balanced binary tree structures include the AVL tree and the AA tree).
Contents |
General Trees
All binary search trees have search-order property allowing lookup of a value by the item's key. Binary nodes are normally a nested class with references to its stored value and each of its two "child" nodes. Keys are ordered in the tree such that the key in the left child node is greater than the key of the parent, and the key in right child node is less than that of the parent (thus an in-order traversal will yield the keys in ascending order). Because of this ordering, the find operation on a value involves comparing the key's value with that of the current node (starting at the root). If the value is smaller, the left child is then checked against the value; if larger, the right. This continues until the value is found, providing a logarithmic average case search.
Removal, while slightly more interesting, also involves recursion down the tree. First the proper node is found with the same strategy as above. If the node has no children or one child, the node is simply removed (or replaced by the child if present). In the case that the node to be removed has two children, the algorithm involves replacing the item in that node with the smallest item in its right sub-tree, and then deleting that item's original node (easily done, as above). Note this type of removal does not make the tree deeper, which is a good thing since because deeper trees mean longer traversal times.
Insert is performed by first locating the proper empty place in which to do the insertion and then adding a node with the proper value in that place. Insertions have the potential to easily unbalance the tree, particularly is such insertions are performed "blindly". Seemingly trivial cases such as inserting items into the tree in order can create a terribly inefficient tree (as nodes are consecutively added to the end of one side of the tree, creating two sides of the tree that are wildly discrepant in size). In fact, any input that contains long sequences in sorted order will cause the tree to deteriorate. It also turns out that random removes do not remarkably improve tree balance.
Balanced Trees
Minimizing lookup costs
The cost to access any node is 1 + its depth, and that depth is O(log N) for a balanced tree and up to N for an unbalanced one. (It turns out that in the average case using random insertions, the cost is 1.38 * log N, which is still logarithmic. This occurs since balanced trees are more likely to occur than unbalanced trees). Average node depth is internal path length (the sum of the depths of all nodes in the tree) divided by the number of nodes, and average node depth +1 is the average cost of a successful lookup. Therefore, the key to shorter lookups is smaller internal path lengths. Red-black trees provide one mechanism of maintaining the tree's balance to improve lookup times, as no node is allowed to get too deep.
Balancing the Tree
Since the insert and remove methods change the tree and may unbalanced it, additional operations should be performed within those methods before returning.
The AVL tree (the first binary search tree, named for its discovers Adelson-Velskii and Landis), allows the height of left and right sub-trees to differ at most by 1 (the height of an empty sub-tree is taken to be -1). Since only nodes that are on the path from the root to the insertion point may have their balance upset, rebalancing the deepest affected node would have the consequence of rebalancing all other affected nodes. Though the average depth of an AVL tree is close to log N, its efficiency could be improved: since it requires both a pass down and a pass back up the tree, a tree which requires one downward pass (such as a red-black tree) improves performance.
Red-Black Trees
Characteristics
The red-black tree is one implementation of a balanced binary search tree that guarantees log N depth (and therefore log N lookup times) in the worst case. Insertion and deletion take slightly longer, but provide internal path lengths close to N log N (rather than 1.38N log N, the average case for random insertions). A red black tree has all of the properties of more general binary search tree, and adds the following:
-every node is red or black
-every null leaf is black
-if node is red, its children are black
-every simple path from node to descendant leaf contains same number of black nodes.[1]
"Null leafs" are child nodes that are null (sometimes called "external tree nodes"). External path length, then, is considered to be the sum of the depths of the null nodes. Since every tree must have n+1 total null nodes, total external path length divided by n+1 is the average depth of the null nodes. The average cost of an unsuccessful search or insertion, then, is this average depth +1.
Since all paths have the same number of black nodes (called the "black height" of the tree[2]), half or less of nodes on each path are red, and at least half are black. Since the root must be black and there cannot be two red nodes in a row on any path, the height of the tree is guaranteed to be 2 log(N+1) at most.
Method of Insertion
The insertion algorithm involves the following: Traverse the tree to find the null node where the new value will go. New nodes are added as leaves, and the new leaf must be colored red (to make it black would mean there are more black nodes on this path than on others, which violates rule#4).
If the new node's parent is red also, a violation of the structure of red-black trees has occurred and rebalancing is required. This is accomplished by color changes and tree rotations. Several situations are possible, which we consider here:
Case 1: If the sibling of the parent of the node to add is black (note all null nodes are considered to be black in this implementation), do the following based on whether the new node is "outside" or "inside" of its grandparent:
-If the new node is "outside", the situation is fixed with one rotation that switches the roles of the parent and the added node. -If the new node is "inside", two rotations are needed. First the node is pushed up past its parent and re-colored to black. The node that was pushed down becomes red.
Case 2: If the sibling of the parent of the node to add is red, this can be fixed by iterating over the tree: first search top-down to find the proper insertion place (this insures that the sibling of the parent is not red). During the traversal down the tree, if a node is found to have two red children, the node is re-colored to red and its children are re-colored to black. If the root is made red, it is re-colored to black. If the sibling of the parent is red, the situation can be fixed with a double or single rotation (as outlined above). The node is inserted when proper place is found. If the parent is red, additional rotations are completed. Note it is not possible that both parent and parent's sibling are red. This is continued down the path (all nodes with two red children are re-colored). Insert the node when the proper place is found (if the parent is red, additional rotations are needed).
Though the balancing property of the red-black tree is weaker than that of the AVL tree, experimentally the average number of nodes visited during a search is similar. The red-black tree also has a lower overhead, and rotations are needed infrequently in practice.
