CSCI 330 Lab 5: symbol tables and context-sensitive checking

For lab5, we'll expand our lab4 solutions to include elements of context sensitive checking using a symbol table of our own design.

The lab can be obtained and submitted through our usual git approach (repo name 'lab5' this time), and the starting repo contains a the files for a sample solution to lab4. You're welcome to replace these with your own lab4 solution code or to modify them as you see fit..

Known issues in lab4 sample solution
this is here as a just-in-case: if anyone spots a bug in the lab4 sample solution let me know and I'll share it here along with a suggested fix

Overview

In this lab we're expanding our parser for yarrgh to do some basic context senstive checking:
  1. making sure variables are declared before use and are accessible in the current scope,
  2. producing warnings if the code tries to assign a real number to an integer variable.

The context-sensitive rules we'll check for are as follows:


Recommended Solution Approach

We have several key sets of information to keep track of as the program is parsed:
(i) which scopes are currently active?
(ii) what data type (integer or real) does each expression and subexpression evaluate to?
(iii) what variable names/types are declared/accessible in each scope?
We'll consider each of these individually and provide a workable approach for each (other approaches are certainly viable).


(i) which scopes are currently active?

We'd like to keep track of which scopes are active, adding new scopes as we enter them and removing them as we leave.

We can give each scope a unique integer identifier (e.g. start with 1 as the global scope, and incrementing each time a new scope is first entered). Perhaps have a global int variable to keep track of either the most recent or the next id value to be assigned.

A stack of ints is then the ideal mechanism for keeping track of the stacks that are currently open: each time you open a new scope you push its id onto the stack, and when the scope is finally closed you pop the top scope off the stack.

Thus in the C code declarations at the top of either your .lex or .yacc you can declare a simple stack of ints and functions for pop/push/top/isempty and maybe one to initialize the stack, along with your global variable for the next stack id value to use, e.g.
int nextID;        /* id to be assigned to next scope */
int scopes[128];   /* array of scopes */
int numScopes;     /* how many scopes are currently on the stack */
void initScopes(); /* init nextID, numScopes, and scopes array */
int pushScope();   /* push annd increment nextID, return 0 on fail or numScopes otherwise */
int popScope();    /* pop and return top scope id, 0 if fails */
int topScope();    /* return top scope id, 0 if fails */

e can call the initScopes from the main routine In the yacc rule for blocks, where block -> open statements close, we can insert a block of C to do the push operation after the open, and a block of C to do the pop operation after the close (along with error checking).
block: OPENKEY { if (pushScope() == 0) { deal with error, push failed, e.g. maybe stack is full }
       statements
       CLOSEKEY { if (popScope() == 0) { deal with error, pop failed somehow }
       SEMI ;


(ii) what data type does each expression and subexpression evalulate to?

Here things get a little trickier: we need to keep track of extra information associated with each token and nonterminal as parsing proceeds.


(iii) what variable names/types are declared/accessible in each scope?