CSCI 330 Lab 5: symbol tables and context-sensitive checking
For lab5, we'll expand our lab4 solutions
to include elements of context sensitive checking using
a symbol table of our own design.
The lab can be obtained and submitted through our usual git approach
(repo name 'lab5' this time), and the starting repo contains a the files for a sample
solution to lab4. You're welcome to replace these with your own lab4 solution code
or to modify them as you see fit..
- lab5.lex: the sample yarrgh tokenizer
- lab5.yacc: the sample yarrgh parser
- makefile: a makefile to build lab5x from lab5.lex and lab5.yacc
- valid: a subdirectory for valid yarrgh programs
- broken: a subdirectory for syntactically incorrect yarrgh programs
- runtests.sh: a simple bash script to run the lab5x executable
on each of the yarrgh programs in the valid and broken subdirectories
Known issues in lab4 sample solution
this is here as a just-in-case: if anyone spots a bug in the
lab4 sample solution let me know and I'll share it here along with
a suggested fix
|
Overview
In this lab we're expanding our parser for yarrgh to do some basic
context senstive checking:
- making sure variables are declared before use and are accessible in
the current scope,
- producing warnings if the code tries to assign a real number to
an integer variable.
The context-sensitive rules we'll check for are as follows:
- each block is considered its own scope (i.e. from the open
to the close)
- a variable is accessible from it's point of declaration to
the end of the scope in which it is declared, and is accessible
in any nested scopes, e.g.
program
open
def x integer ;
# x is accessible here
repeat open
# x is accessible here
def y real ;
# x and y are accessible here
close until (x < 10) ;
# x is accessible here
close
- you can declare multiple variables with the same name as long as they
are declared in different scopes: you cannot have two declarations for
the same name in the same scope, e.g.
# valid
program
open
def x integer ;
repeat open
def x integer ;
close until (x < 10) ;
close
program
open
def x integer ;
def x integer ; # invalid, already declared an x once in this scope
close
(Note: the important part here is that we're trying to redeclare the same name
in the same scope, it would still be an error if the second def x was as a real.)
- All the math/comparison operators will accept any pair of types, i.e.
integer op integer, integer op real, real op integer, real op real.
- The set operation will display a warning (not an error) if the
code is attempting to assign a real value to an integer variable, but other
pairings proceed normally.
- For the math operators:
- int op int results in an int
- real op real results in a real
- int op real results in a real
- real op int results in a real
Recommended Solution Approach
We have several key sets of information to keep track of as the program is parsed:
(i) which scopes are currently active?
(ii) what data type (integer or real) does each expression and subexpression evaluate to?
(iii) what variable names/types are declared/accessible in each scope?
We'll consider each of these individually and provide a workable approach for each
(other approaches are certainly viable).
(i) which scopes are currently active?
We'd like to keep track of which scopes are active, adding new scopes as we enter them
and removing them as we leave.
We can give each scope a unique integer identifier (e.g. start with 1 as the global scope,
and incrementing each time a new scope is first entered). Perhaps have a global int variable
to keep track of either the most recent or the next id value to be assigned.
A stack of ints is then the ideal mechanism for keeping track of the stacks that are currently open:
each time you open a new scope you push its id onto the stack, and when the scope is finally
closed you pop the top scope off the stack.
Thus in the C code declarations at the top of either your .lex or .yacc you can
declare a simple stack of ints and functions for pop/push/top/isempty and maybe one to
initialize the stack, along with your
global variable for the next stack id value to use, e.g.
int nextID; /* id to be assigned to next scope */
int scopes[128]; /* array of scopes */
int numScopes; /* how many scopes are currently on the stack */
void initScopes(); /* init nextID, numScopes, and scopes array */
int pushScope(); /* push annd increment nextID, return 0 on fail or numScopes otherwise */
int popScope(); /* pop and return top scope id, 0 if fails */
int topScope(); /* return top scope id, 0 if fails */
e can call the initScopes from the main routine
In the yacc rule for blocks, where block -> open statements close, we can insert a block
of C to do the push operation after the open, and a block of C to do the
pop operation after the close (along with error checking).
block: OPENKEY { if (pushScope() == 0) { deal with error, push failed, e.g. maybe stack is full }
statements
CLOSEKEY { if (popScope() == 0) { deal with error, pop failed somehow }
SEMI ;
(ii) what data type does each expression and subexpression evalulate to?
Here things get a little trickier: we need to keep track of extra information associated with
each token and nonterminal as parsing proceeds.
(iii) what variable names/types are declared/accessible in each scope?