CSCI 260 Fall 2010: Self organizing lists

Self organizing lists are simply linked-lists of data which we try to reorganize so that items that have been accessed recently, or more frequently, are moved closer to the front of the list.

Sorting by access frequency

The motivation for this form of list organization is that most searchable data sets contain some items that are accessed frequently, and many items that are accessed rarely.

If we move the frequently-accessed items to the front of the list then the most common searches will occur quickly, and only the "rare" searches will take a long time.

For this method, we include a counter with each data item to track how frequently it has been accessed. Every time the item is accessed we increment the counter and move it forward in the list, passing any items which have been accessed less frequently.

In this method, new items (which have thus never been accessed) are inserted at the back of the list.

Note that the search operation is still worst case O(N).

Sorting by most-recently-accessed

The motivation for this form of list organization is that the items that have been most recently accessed are the ones that are currently being used, and are most likely to be accessed again in the near future.

For this method, each time an item is accessed it is immediately moved to the front of the list.

New items are treated as being "of current interest", and thus items are inserted at the front of the list.

This technique performs the worst if elements are actually called for in a round-robin fashion, since the item is accessed just when it has reached the back of the list.

Note that the search for the node is worst case O(N), and the swap operation to move the node to the front of the list is O(1).

Caches

Another widely-used technique to speed up searches is to maintain two seperate lists:

one small list, the cache, containing items we expect to be used again shortly, and
one complete list of all the items

When a request comes in we check the small list first. If we don't find the item there then we do a search of the regular list, putting the found item into the cache (possibly displacing something that used to be in the cache).

This works well if we have an effective way for picking what belongs in the cache.

The most common caching techniques are:

most frequently used: the k items that have been accessed the most are the ones we'll keep in the cache
most recently used: the k items that have been used most recently are the ones we'll keep in the cache

Which technique we decide to use will be based on what we know (or expect) about the way users will access the data in our lists - is there a group of elements which are accessed far more than the rest (in which case choose most frequently used), or do accesses to elements tend to occur in clusters (in which case choose most recently used).

// the solist (self organizing list) class allows us to organize lists
//     in two different methods
// (1) bycount: list elements are sorted based on the number of times
//              the element has been the target of a successful search
// (2) tofront: each time a list element is found by a search
//              that element is advanced to the front of the list
class solist {
   private:
     // list nodes track the nodes before and after them,
     //            the number of times they have been accessed,
     //            and their key/data fields
     struct sonode {
        sonode *next;
        sonode *prev;
        string key;
        string data;
        int numrefs;
     } *front, *back;  // pointers to the ends of the list

   public:
      // users can specify which of the two organization methods the lists follow
      enum updatetype { bycount, tofront };

      // base constructor and destructor
      solist();
      ~solist();

      // inserts are done at the front of the list (for tofront method)
      //                  or the back of the list (for bycount method)
      bool insert(string k, string d, updatetype t = bycount);

      // updatenode rearranges the list after a node has been accessed
      void updatenode(sonode *n, updatetype t);

      // search finds the first matching string, copies the data field,
      //    and (if successful) calls the update routine
      bool search(string k, string &d, updatetype t = tofront);

      // remove finds, removes, and deletes the first matching string
      //    (after copying the data field)
      bool remove(string k, string &d);

      // display shows the entire list contents, in order
      bool display();
};

solist::solist()
{
   // initialize an empty list
   front = back = NULL;
}

solist::~solist()
{
   // deallocate each list node in turn
   sonode *current = front;
   while (current) {
      sonode *victim = current;
      current = current->next;
      delete victim;
   }
}

bool solist::insert(string k, string d, updatetype t)
{
   // reject duplicate entries
   if (search(k, d)) return false;

   // create the new node, quit if unable to allocate
   sonode *current = new sonode;
   if (!current) return false;

   // initialize the new node
   current->next = current->prev = NULL;
   current->key = k;
   current->data = d;

   // if the list is empty then make this the sole element
   if (!front) 
      front = back = current;

   // if using the organize-by-count method
   //    insert the new node at the back of the list
   else if (t == bycount) {
      current->prev = back;
      back->next = current;
      back = current;
   }

   // otherwise insert the new node at the back of the list 
   else {
      current->next = front;
      front->prev = current;
      front = current;
   }

   // insertion successful
   return true;
}

bool solist::search(string k, string &d, updatetype t)
{
   // search for the first matching node
   sonode *current = front;
   while (current) {
      if (current->key == k) {
         d = current->data;
         // reorganize the list if a match was found
         updatenode(current, t);
         return true;
      }
      current = current->next;
   }
   return false;
}

bool solist::remove(string k, string &d)
{
   // search for the first matching node
   sonode *current = front;
   while (current) {
      if (current->key == k) {
         // find the nodes around the victim
         sonode *prev = current->prev;
         sonode *next = current->next;

         // if this is the only list element
         //    just make the list empty
         if ((!next) && (!prev)) front = back = NULL;

         // reset the neighbours' pointers to bypass the victim
         if (next) next->prev = prev;
         if (prev) prev->next = next;

         // extract the data and deallocate the victim
         d = current->data;
         delete current;
         return true;
      }
      current = current->next;
   }
   return false;
}

bool solist::display()
{
   // display each node's key and data in turn
   sonode *current = front;
   while (current) {
      cout << current->key << ":" << current->data << " (numrefs: ";
      cout << current->numrefs << ")" << endl;
      current = current->next;
   }
   return true;
}

void solist::updatenode(sonode *n, updatetype t)
{
   // if the node is empty bail out
   if (!n) return;

   // update the node's reference count
   n->numrefs++;

   // if the node is already at the front
   //    then don't bother trying to move it
   if (front == n) return;

   // keep track of the node's neighbours
   //    in case we need to move it
   sonode *prev = n->prev;
   sonode *next = n->next;

   // if the list is organized by reference count then advance
   //    the node to bypass everything with a smaller count
   if (t == bycount) {
      sonode *target = n;
      // find out how far n can advance
      while ((target->prev) && (target->prev->numrefs <= n->numrefs)) 
            target = target->prev;
      // if n is already as far forward as it needs to be then quit
      if (target == n) return;
      // chop n out of its current position
      if (prev) prev->next = next;
      else front = next;
      if (next) next->prev = prev;
      else back = prev;
      // put n in front of target
      prev = target->prev;
      if (prev) prev->next = n;
      else front = n;
      n->next = target;
      n->prev = prev;
      target->prev = n;
   } 

   // otherwise swap the new node to the front of the list
   else {
     prev->next = next;
     if (next) next->prev = prev;
     else back = prev;
     n->prev = NULL;
     n->next = front;
     front->prev = n;
     front = n;
   }
}