CSCI 260 Fall 2010: Heaps

A heap is another binary-tree structure that holds data in an organized fashion, but the organization is quite different than in a binary search tree or avltree.

In a heap, the largest value is always in the root of the tree, and every node contains the largest value in its own subtree.

Furthermore, the nodes in the tree are always filled from the top down, and within a level they are filled from left to right. Thus the tree is always as full as possible.

For example, the following are all valid heaps:

  9       17           36            40
 /       /  \         /  \          /  \
8       12   4       9    20      40    40
                                 /  \
                                10  20
Note that smaller values can go into either subtree, and that duplicates are allowed.

The following are some examples of invalid heaps:

     8          32         4         6             
    / \        /  \       /         / \
   6   9      6   10     3         4   5
             / \        /         /   / \
            1   7      2         1   2   3

- in the first example, 9 > 8 so the tree is invalid
- in the second example, 7 > 6 so the tree is invalid
- in the third example, the second level isn't full but we've got nodes on the third level (we have to completely fill one level before moving on to the next)
- in the fourth example, node 5 has children even though node 4 has an empty child (we have to fill the bottom level from left to right)

Heaps are not terribly effective for searching data, since we don't which subtree to search for a specific value (just that the values in both subtrees will be less than or equal to the current node).

However, heaps are very effective for sorting a specific collection of data, or for accessing data in sorted order.

The typical (public) heap operations are insert (which puts a new value into the heap, while maintaining a valid heap structure) and remove (which removes the root element from the heap - i.e. the largest value in the heap - and then rearranges the heap to maintain its structure).

For instance:

Of course, the trick is to efficiently maintain a valid heap structure when performing inserts and removes.

When inserting:

For example, suppose we insert 27 into the following tree:
      21            21          21          27
     /  \          /  \        /  \        /  \
    18   20  =>  18    20 => 18    27 => 18    21
   /  \         / \   /      / \   /     / \   /
  9   10       9  10  27    9  10 20    9  10 20

When removing:

For example, suppose we do a remove from the following tree:

     32           9          16           16
    /  \         / \        /  \         /  \
   16   10 =>  16   10 =>  9    10 =>  11    10
  /  \        /           /            /
11    9      11          11           9
Since each of these operations just traverses one path down the tree, and since the tree is as compact as possible, we can be sure that our insert and remove operations are always O(lg(N)) for a heap of N nodes.

Array-based implementations

Because the tree is maintained as compactly as possible, and because we always fill the tree levels left-to-right, we can make effective use of an array-based implementation for heaps (as long as we know of an acceptable upper bound on the maximum size of the heap).

This implementation assumes a size for the heap can be determined when we create the heap (e.g. we specify that the heap can hold up to N elements).

The data is then stored in a dynamically-allocated array, and we keep track of both the amount of space allocated for the array and the number of elements currently stored in the heap.

Since the heap is a form of binary tree, we can use the following rules for accessing parents/children:

Pointer-based implementations

In the labs you'll be implementing a pointer-based heap, rather than the array-based implementation discussed here. Aside from the usual pointer-based tree issues (setting up the node structs, using private recursive methods, etc) there is one important issue to consider: how to efficiently find the next/last free spot in the tree.

In the array version, if the heap currently has N items in it then we know the last one is in array position N-1 and the next available space is in array position N.

In a pointer-based version, we have to find the correct pointer:

Fortunately, if we know the current size of the heap we can compute the chain of nodes we need to follow to find the correct insertion/removal point.

Let's assume that in our heap class we add a currentsize data field: initializing it to 0 in the constructor and incrementing or decrementing it as necessary when doing inserts and removes.

If we want to insert then we need to compute the path to position currentsize, and if we want to remove then we need to compute the path to position currentsize-1.

In essence, our algorithm will work backwards from the target node:
void showpath(int N)
{
   while (N > 0) {
      // if N is even then it's a right child,
      //    otherwise it's a left child
      if ((N % 2) == 1) 
         cout << N << " is the left child of node "; 
      else
         cout << N << " is the right child of node "; 

      // compute which node is N's parent,
      //    and move up to that level
      N = (N - 1) / 2;
      cout << N << endl;
   }
}

A full example of an array-based heap implementation is given below.


#include <iostream>
#include <string>
using namespace std;

// if the user doesn't specify how large a heap they want
//    we'll use this value
const int DEFAULT_HEAPSIZE = 1024;


class heap {
   private:
      // store the data, 
      //       the number of items the heap can hold,
      //   and the number of items it currently holds
      string *hp;
      int     maxsize;
      int     cursize;

      // helper function to maintain heap properties after
      //   an insert operation
      bool moveup(int pos);

      // helper function to maintain heap properties after
      //   a remove operation
      bool movedown(int pos);

   public:
      // heap constructor and destructor
      heap(int sz = DEFAULT_HEAPSIZE);
      ~heap() { delete hp; }

      // insert one element or many
      bool insert(string s);
      int insert(string src[], int sz);

      // remove one element (the largest) or many
      bool remove(string &s);
      int remove(string dest[], int num);

      // print the elements, top-down/left-to-right
      void  print();

      // look up the number of items currently in the heap
      int  getsize() { return cursize; }
};

// attempt to insert num elements into the heap
// returns the number of elements successfully inserted
int heap::insert(string src[], int sz)
{
   int count = 0;
   for (int i = 0; i < sz; i++) {
       if (insert(src[i])) count++;
   }
   return count;
}

// attempt to remove num elements from the heap
//    (the next largest element each time)
// each successful remove stores the element in
//    the passed dest array in the next available position
// returns the number of elements successfully removed
int heap::remove(string dest[], int num)
{
   int count = 0;
   for (int i = 0; i < num; i++) {
       string s;
       if (remove(s)) dest[count++] = s;
   }
   return count;
}

// print the heap contents,
//    top-down, left-to-right
void heap::print()
{
   // error checking
   if ((!hp) || (maxsize < 1)) return;

   // go through the heap top-down, left-to-right,
   // i.e. just step through the array!
   for (int i = 0; i < cursize; i++) {
       cout << "(" << i << "): ";
       cout << hp[i] << endl;
   }
}

// allocate a heap of the specified size
// (if the size is invalid, or there is insufficient memory
//  then set the maximum heap size to 0)
heap::heap(int sz)
{
   // initially there are no elements stored in the heap
   cursize = 0;

   // try to allocate a heap of the specified size
   if (sz > 0) hp = new string[sz];

   // if the allocation doesn't work then treat it
   //    as a heap that can't hold anything!
   else hp = NULL;
   if (!hp) maxsize = 0;
   else maxsize = sz;
}

// if there is sufficient space,
//    insert string s into the next heap position
//    then call moveup to restore the heap properties
// return true if successful,
//        false otherwise
bool heap::insert(string s)
{
   // error checking 
   if ((cursize >= maxsize) || (!hp)) return false;

   // put the new element in the next available heap position
   //     and increase the count of heap elements by one
   hp[cursize++] = s;

   // call moveup to push the new value 
   //    as far up the heap as it needs to go
   //    to restore the heap properties
   if (!moveup(cursize - 1)) {
      cout << "ERROR: moveup failed!" << endl;
      cursize--;
      return false;
   } else return true;
}

// if the heap isn't empty,
//    copy the root value into parameter s
//    move the "last" heap element into the root
//    and call movedown to restore the heap properties
// return true if successful,
//        false otherwise
bool heap::remove(string &s)
{
   // error checking 
   if ((cursize < 1) || (!hp)) return false;

   // copy the (largest) value out of the root
   s = hp[0];

   // move the "last" value in the heap into the root
   hp[0] = hp[cursize-1];

   // remove the last value from the heap
   cursize--;

   // call movedown to push the moved value 
   //    as far down the heap as it needs to go
   //    to restore the heap properties
   if (!movedown(0)) {
      cout << "ERROR: movedown failed!" << endl;
      cursize++;
      return false;
   } else return true;
}

// while the value in the heap at the specified position
//    is greater than the value of its parent,
//       swap the two of them
// return true if successful,
//        false otherwise
bool heap::moveup(int pos)
{
   // error checking 
   if ((pos < 0) || (pos >= cursize) || (!hp)) return false;

   // keep pushing a value up the heap until
   //    either we reach the root position (0)
   //    or we hit larger values
   while (pos > 0) {

      // compute the position of the parent of pos
      int parent = (pos - 1) / 2; 

      // if the current value is no larger than
      //    its parent's value then we're done
      if (hp[pos] <= hp[parent]) return true;

      // otherwise we need to swap the two values
      //    and continue up from the parent
      else {
         string tmp = hp[pos];
         hp[pos] = hp[parent];
         hp[parent] = tmp;
         pos = parent;
      }
   }

   // we hit position 0 
   return true;
}

// while the value in the heap at the specified position
//    is less than the value of one of its children,
//       swap it with the larger of its two children
// return true if successful,
//        false otherwise
bool heap::movedown(int pos)
{
   // error checking 
   if ((pos < 0) || (pos >= cursize) || (!hp)) return false;

   // keep going down the heap until we've hit a leaf
   //    or the value has reached a valid position
   while (pos < cursize) {

      // compute the position of the left and right children
      int left = 2 * pos + 1;
      int right = 2 * pos + 2;

      // if we find a child with a larger value than
      //    in the current heap position,
      // we'll store its string in target
      //         and its position in replacement
      string target = hp[pos];
      int replacement = -1;

      // check to see if the left child (if there is one)
      //    contains a larger value than the one in the 
      //    current position in the heap
      if ((left < cursize) && (hp[left] > hp[pos])) {
         target = hp[left];
         replacement = left;
      }

      // check to see if the right child (if there is one)
      //    has an even larger value
      if ((right < cursize) && (hp[right] > target)) {
         target = hp[right];
         replacement = right;
      }
   
      // if we found a replacement for the current node
      //    then swap them and continue moving down
      // otherwise we're done
      if (replacement >= 0) {
         string tmp = hp[pos];
         hp[pos] = hp[replacement];
         hp[replacement] = tmp;
         pos = replacement;
      } else return true; 
   }

   // this should never be reached
   return true;
}