CSCI 260 Fall 2010: Balanced binary search trees (AVL trees)

While well-structured binary search trees provide excellent efficiency for searches, inserts, and remove operations ( O(lgN) for each ), the tree insertion routines we have used so far do not guarantee the tree will be well structured.

In fact, as we have seen, the tree structures can potentially wind up more like linked lists than a tree - in which case the efficiency for our common operations deteriorates to O(N) for a tree of N nodes.

The solution is to keep our trees well structured as we build them. The typical approach is to check (and fix) the structure of the tree after each insert operation and each remove operation.

As we will see, it is not necessary to keep the tree perfectly structured - for our purposes it will suffice to keep a tree "nearly" balanced.

We will define a tree to be sufficiently balanced if, for every node in the tree, its left subtree and right subtree differ in height by at most 1.

We will define an avl tree to be a binary search tree which conforms to that balance rule, i.e. for every node in an avl tree:

  1. every node in its left subtree has a smaller key value,

  2. no node in its right subtree has a smaller key value, and

  3. the height of its left and right subtrees differ by at most 1.

Whenever we change the structure of the tree through an insert or remove, we will update the heights of any affected nodes, and call routines to check the tree's balance and rearrange nodes as necessary to restore its balance.

Similarities to simple binary search trees

Many of the methods used for our simple binary search trees can be used for avl trees without requiring any changes. This includes:

Differences from simple binary search trees

Of course, some aspects of our tree implementation must change in going from a binary search tree to an avl tree.

Checking and adjusting tree balance

We chose to regard a subtree as sufficiently balanced if it's left and right subtrees are balanced and differ in height by at most 1.

We can very easily update the height of a given node in the tree if we know that the heights of its subtrees are correct as follows:

    if n's left subtree is taller than n's right subtree
       then n's height is one plus the height of its left subtree
    otherwise
       n's height is one plus the height of its right subtree
If the tree is unbalanced, then we know that one of the subtrees is at least two levels taller than the other.

If that's the case, then we could reduce the problem by cutting down the height of the taller side by one and increasing the height of the shorter side by one.

This is achieved by rotating elements of the taller subtree into the shorter subtree. Thus we may need to rotate to the left, or rotate to the right - depending on which of the two subtrees is the taller one.

Consider the two examples below, illustrating simple rotations to the left and right:

Rotation to the left:
  move the right child (Y) "up" and to the left,
  while moving the root of the subtree (N) "down" and to the left
note that if Y had a left child it
  must be transfered over to become N's right child, but
  this still maintains the valid binary search tree properties

   BEFORE               AFTER
      N                   Y
     / \                 / \
    /   \               /   \
   X     Y             N     D
  / \   / \           / \
 A   B C   D         X   C
                    / \
                   A   B


Rotation to the right:
  move the left child (X) "up" and to the right,
  while moving the root of the subtree (N) "down" and to the right
note that if X had a right child it
  must be transfered over to become N's left child, but
  this still maintains the valid binary search tree properties

   BEFORE               AFTER
      N                   X
     / \                 / \
    /   \               /   \
   X     Y             A     N
  / \   / \                 / \     
 A   B C   D               B   Y
                              / \                
                             C   D
Our policy after an insertion or removal will be to work from the bottom of the tree (the point of change) upwards - solving our balance problems as close to the source as possible.

Prior to an insert/remove, we know the heights of subtree pairs differ by at most one throughout the tree, so after a single insertion or removal we know the heights differ by at most two.

If the tree has become unbalanced, it is possible that a single rotation will completely solve our balance problem, but there is a case when a second rotation would be necessary:

In such a situation we can solve the problem by performing two rotations instead of one:

Efficiency analysis

The worst case arrangement (for search, insert, remove efficiency) is when we have a small number of nodes in a tall tree.

Here we'll try to examing just how good (or bad) we can expect our avl tree structures to be.

Let Th be the smallest number of nodes we could have in a valid avl tree of height h.

Note that the this happens when one of the two subtrees is shorter than the other, giving Th = 1 + Th-1 + Th-2

Let's define an extra variable, Fh = Th + 1 Then:

Thus we can note that Fh+2 = Fh+1 + Fh, and if we observe that F2 = 5 and F3 = 8 then this precisely defines the Fibonacci sequence!

Thus the smallest number of nodes in a valid AVL tree of height h is given by the hth Fibonacci number minus 1.

Fortunately, Binet, Euler et al have worked out a formula for the hth Fibonacci number:

    (1+50.5)h - (1 - 50.5)h
  -------------------------
      2(5)0.5
The optimal height for a tree of Th nodes is log2(Th), and our actual height is h, so if we take h and divide it by log2(Fib(h) - 1) then we have a ration of our worst case AVL tree to the optimal binary search tree!

In fact, this boils down to a ratio of approximately 1.44, i.e. our AVL trees are (at worst) of height approximately 1.44 lg(N).


SAMPLE CODE


#include <string>
#include <iostream>
using namespace std;

class avltree {
   private:
      // each node keeps track of it's left and right child,
      //      it's key value, and it's height
      //      (the number of nodes beneath it on the longest
      //       path to a leaf)
      struct node {
          node *right, *left;
          string      key;
          int         height;
      };
      // maintain a pointer to the root of the tree
      node *root;

      // private, recursive routines (used by the public methods)

      // search n's subtree for the top node with key matching k,
      node *search(string k, node *n);

      // delete all nodes in n's subtree
      void deallocate(node* &n);

      // create and insert a node with the specified key/data
      //    values within the subtree rooted at n
      bool insert(string k, string d, node* &n);

      // in n's subtree, remove the topmost node whose
      //    key matches k
      bool remove(string k, node* &n);

      // print, in sorted order, all the key values
      //    in n's subtree
      void print(node *n);

      // print the pointer structure for the subtree of n
      // (each node's key & the keys of the nodes it points at)
      void debugprint(node *n);

      // rotate to the left through node n 
      void rotate2left(node* &n);

      // rotate to the right through node n
      void rotate2right(node* &n);

      // check if n's subtree is unbalanced, and
      //    perform any necessary rotations to fix it
      void nodecheck(node* &n);

      // find the node with the smallest key in n's subtree
      node *findsmallest(node *n);

      // update the height field of node n, assuming
      //    its children's fields are up to date
      void updateheight(node *n);

   public:
      // create an empty tree
      avltree() { root = NULL; }

      // deallocate the tree
      ~avltree() { deallocate(root); }

      // display the keys in the tree (sorted)
      void display() { print(root); }

      // display the tree pointer structure
      void debug() { debugprint(root); }

      // create and insert a new node in the tree
      bool insert(string k, string d) {
           if (insert(k, d, root)) nodecheck(root);
           else return false;
           return true;
      }

      // remove the topmost node with the specified key
      bool remove(string k) {
           if (remove(k, root)) nodecheck(root);
           else return false;
           return true;
      }

      // determine if the tree contains any nodes with
      //    the specified key
      bool search(string k) {
          node *n = search(k, root);
          if (!n) return false;
          return true;
     }
};

void avltree::updateheight(node *n)
// compute the height of node n, assuming the heights
//    of n's left and right children are correct
// n's height is one greater than the height of the
//    taller of its two children
{
   // make sure n isn't null
   if (!n) return;

   // remember one or both of n's children might be null
   if ((!n->left) && (!n->right)) {
      n->height = 0;
   } else if (!n->left) {
      n->height = n->right->height + 1;
   } else if (!n->right) {
      n->height = n->left->height + 1;
   } 
   // general case: both children exist,
   else {
      n->height = n->left->height + 1;
      if (n->height <= n->right->height)
         n->height = n->right->height + 1;
   }
}

avltree::node *avltree::search(string k, node *n)
// search the subtree rooted at n,
//    looking for the topmost node whose key matches k
// if a match is found return a pointer to the node,
// otherwise return null
{
   if (!n) return NULL;
   if (n->key == k) return n;
   else if (n->key > k) return search(k, n->left);
   else return search(k, n->right);
}

void avltree::deallocate(node* &n)
// delete all nodes in the subtree rooted at n,
//    and set n to null
{
   if (!n) return;
   deallocate(n->left);
   deallocate(n->right);
   delete n;
   n = NULL;
}

avltree::node *avltree::findsmallest(node *n)
// in the subtree rooted at n,
//    find the node with the smallest key value
// (i.e. go as far left as possible)
{
   if (!n) return NULL;
   while (n->left) n = n->left;
   return n;
}

bool avltree::insert(string k, node* &n)
// insert a new node in the binary search tree rooted at n,
// returning true if successful, false otherwise
//
// after a successful insertion below n,
//    nodecheck is called to determine if the
//    subtree rooted at n is unbalanced,
//    and to perform any reconstruction necessary
{
   // if we've found the end of a chain,
   //    insert the node here
   if (!n) {
      n = new node;
      if (!n) return false;
      n->key = k;
      n->left = NULL;
      n->right = NULL;
      n->height = 0;
      return true;
   }

   // call the insert routine recursively on either
   //      the left or right subtree,
   // checking for and performing rotations if it
   //      was successful
   if (n->key > k) {
      if (insert(k, d, n->left)) {
         nodecheck(n->left);
         return true;
      }
   } else {
      if (insert(k, d, n->right)) {
         nodecheck(n->right);
         return true;
      }
   }

   // if we get here then the recursive insert
   //    was unsuccessful
   return false;
}

bool avltree::remove(string k, node* &n)
// if the subtree rooted at n contains a node whose key
//    matches k then remove it from the subtree,
//    then check for any necessary reconstruction of the tree
// return true if an element is successfully removed,
//     or false otherwise
{
   // if n is an empty tree then give up
   if (!n) return false;

   // if the matching node must be somewhere in the left subtree
   //    then make a recursive call and check for any needed rotation
   if (n->key > k) {
      if (remove(k, n->left)) {
         nodecheck(n->left);
         return true;
      } 
   } 

   // if the matching node must be somewhere in the right subtree
   //    then make a recursive call and check for any needed rotation
   else if (n->key < k) {
      if (remove(k, n->right)) {
         nodecheck(n->right);
         return true;
      }
   } 

   // if the current node is the one that must be removed,
   //    base the handling on how many children the node has
   else {
      // remember which node we'll actually delete
      node *victim = n;

      // if the node has no children we can simply delete it
      if ((!n->left) && (!n->right)) {
         delete n;
         n = NULL;
         return true;
      } 

      // if the node has just a right child then we can
      //    bypass n (i.e. make the pointer to n point
      //    to its right child instead)
      else if (!n->left) {
         n = n->right;
         delete victim;
         return true;
      }

      // if the node has just a left child then we can
      //    bypass n (i.e. make the pointer to n point
      //    to its left child instead)
      else if (!n->right) {
         n = n->left;
         delete victim;
         return true;
      } 

      // if the node has two children then we'll replace the
      //    node with the smallest node from the right subtree
      //    (basically copying the other node's key value
      //     over top of n's)
      // then make a recursive call to remove the duplicate
      //    element from the right subtree,
      //    remembering to check for necessary rotation
      //    once we've altered n's subtrees
      else {
         victim = findsmallest(n->right);
         if (!victim) return false;
         string vkey = victim->key;
         if (!remove(victim->key, n->right)) return false;
         n->key = vkey;
         nodecheck(n);
         return true;
      }
   }
   return false;
}

void avltree::print(node *n)
// display the key contents of the subtree rooted at n,
// sorted (ascending) by key value
{
   if (!n) return;
   print(n->left);
   cout << n->key << endl;
   print(n->right);
}

void avltree::debugprint(node *n)
// display the contents and structure of the subtree rooted at n,
// performed via preorder traversal
{
   if (!n) return;
   cout << n->key << " (";
   if (n->left) cout << n->left->key;
   else cout << "NULL";
   cout << "<-left,right->";
   if (n->right) cout << n->right->key;
   else cout << "NULL";
   cout << ")(height:" << n->height << ")" << endl;
   debugprint(n->left);
   debugprint(n->right);
}

void avltree::rotate2left(node* &n)
// rotates n's right child up, and n down to the left
//   BEFORE               AFTER
//      N                   Y
//     / \                 / \
//    /   \               /   \
//   X     Y             N     D
//  / \   / \           / \
// A   B C   D         X   C
//                    / \
//                   A   B
{
   node *tmp = n;         // remember N
   n = n->right;          // make Y the root of the subtree
   tmp->right = n->right; // make C into N's right child
   n->left = tmp;         // make N into Y's left child
   updateheight(tmp);     // N's height has probably changed
   updateheight(n);       // Y's height has probably changed
}

void avltree::rotate2right(node* &n)
// rotates n's left child up, and n down to the right
//   BEFORE               AFTER
//      N                   X
//     / \                 / \
//    /   \               /   \
//   X     Y             A     N
//  / \   / \                 / \     
// A   B C   D               B   Y
//                              / \                
//                             C   D
{
   node *tmp = n;         // remember N
   n = n->left;           // make X the root of the subtree
   tmp->left = n->left;   // make B into N's left child
   n->right = tmp;        // make N into X's right child
   updateheight(tmp);     // N's height has probably changed
   updateheight(n);       // X's height has probably changed
}

void avltree::nodecheck(node* &n)
// determine if the subtree rooted at n has become unbalanced
// (i.e. the height difference between the left and right 
//   subtrees of n is more than 1)
// and perform any rotations necessary to reconstruct the tree
//   in a balanced form.
{
   // quit if n is an empty tree
   if (!n) return;

   // update the height and balance fields for n
   updateheight(n);

   // store the height of the left and right subtrees
   int leftheight = 0;
   int rightheight = 0;
   if (n->left) leftheight = n->left->height;
   if (n->right) rightheight = n->right->height;

   // quit if n is balanced (i.e. if the left and right
   //      subtree heights are within 1 of each other)
   if ((leftheight <= (rightheight+1)) && 
       (leftheight >= (rightheight-1))) return;

   // handle the cases where the right subtree is taller
   if (rightheight > leftheight) {
      int Rright = 0;
      int Rleft = 0;
      if (n->right->left) Rleft = n->right->left->height;
      if (n->right->right) Rright = n->right->right->height;
      // if Rleft is taller than Rright then we need an
      //    extra rotation
      if (Rleft > Rright) rotate2right(n->right);
      // either way we need to rotate to the left through n
      rotate2left(n);
   }

   // handle the cases where the left subtree is taller
   else {
      int Lright = 0;
      int Lleft = 0;
      if (n->left->left) Lleft = n->left->left->height;
      if (n->left->right) Lright = n->left->right->height;
      // if Lright is taller than Lleft then we need an
      //    extra rotation
      if (Lleft < Lright) rotate2left(n->left);
      // either way we need to rotate to the right through n
      rotate2right(n);
   }

}