Stage 4: User Defined Variables and arrays¶
Time estimate
1 week, 5-10 hours/week
Learning Objectives
You will extend the language of Stage 3 to permit users to declare and use variables of integer and string types. You will learn symbol table management enroute.
In this stage, we allow the program to contain variable declarations of the following syntax:
Declarations ::= DECL DeclList ENDDECL | DECL ENDDECL
DeclList ::= DeclList Decl | Decl
Decl ::= Type VarList ;
Type ::= INT | STR
VarList ::= Varlist , ID | ID
We will assume hereafter that all variables used in a program must be declared in the declaration section of the program (between the decl and enddecl keywords). Since string type variables are allowed, we will allow string constants as well. (See ExpL specification for details).
A simple program in this language to find the sum of numbers entered from the console (until a zero is entered) would look like the following:
decl
int num, sum;
str mesg;
enddecl
read(num);
sum = 0;
while (num != 0) do
sum = sum + num;
read(num);
endwhile;
write("sum is:");
write(sum);
mesg = "good bye";
write(mesg);
It is the responsibility of the compiler to track for various semantic errors as:
- Flag error if any variable not declared is used.
- Flag error if a type mismatch involving any variable is found.
To this end, while parsing declarations, the compiler transfers the information about variables in a compile time data structure called the symbol table. The symbol table stores the following information about each variable:
- Name of the variable (known at the time of declaration).
- Type (For the present stage, only integer/string).
- Size (For the time being, we will assume that all variables have size one).
- The memory binding of each variable – that is, static memory address determined by the compiler for the variable.
The first three entries are determined by the declaration of the variable. For the fourth, a simple strategy would be to allocate the first address (4096) for the variable declared first, 4097
for the next variable and so on. Note that here too we are fixing the address of each variable at compile time (static allocation).
The following structure may be used for a symbol table entry:
struct Gsymbol {
char* name; // name of the variable
int type; // type of the variable
int size; // size of the type of the variable
int binding; // stores the static memory address allocated to the variable
struct Gsymbol *next;
}
The symbol table entries for the program above would look as below:
To implement the symbol table, you must write two functions. For a simple implementation, a linear linked list suffices. In modern compilers, hash tables are maintained to make search efficient.
struct Gsymbol *Lookup(char * name); // Returns a pointer to the symbol table entry for the variable, returns NULL otherwise.
void Install(char *name, int type, int size); // Creates a symbol table entry.
Note
You must check before installing a variable whether the variable is already present. If a variable is declared multiple times, the compiler must stop the compilation and flag error.
Task 1
Complete the program to parse declarations and set up the symbol table entries and print out the contents of the symbol table.
The next task is to make necessary modifications to the AST construction and code generation. These are straightforward. Add a an additional field to the tree node structure
typedef struct tnode{
int val; // value of the constant
char* varname; // name of the variable
int type; // type of the variable
int nodetype; // node type information
struct Gsymbol *Gentry; // pointer to GST entry for global variables and functions
struct tnode *left,*right; // left and right branches
}tnode;
While constructing the tree, if a variable is encountered, keep a pointer to the corresponding symbol table entry. Set the type field as well to the correct type. The rest of the type checking steps are exactly as in the previous stage. The AST of while loop present in the above code is as follows (the relevant part of the code is shown below for easy reference):
.
.
.
while (num != 0) do
sum = sum + num;
read(num);
endwhile;
.
.
.
There is no serious change to the code generation process, except that for variables, the binding address is obtained from the symbol table.
Important Note
The XSM architecture is unrealistic in that allows a memory word to hold a string. Normally, in a real system, a string would require storing the characters one after another in consecutive memory locations as in an array. You will anyway learn array allocation immediately.
Task 2
Complete the AST construction and code generation steps.
Adding arrays¶
The next step is to allow declaration of arrays like:
decl
---
int a[100];
str names[20];
---
enddecl
The declaration syntax must permit:
Varlist ::= Varlist , ID[NUM] | ID[NUM]
To implement this, for each variable, you must reserve as much static space as specified by the declaration and set the size field in the symbol table to indicate the number of words allocated. The next variable must be allocated space only below this space allocated.
For instance, for the declaration,
decl
int a[10], b;
enddecl
The binding field in the symbol table for the variable a may be set to address 4096. The size entry set to 10. This means that we are allocating locations 4096-4105 for the array. The next variable, b can be bound to the address 4106.
Task 2
Complete the implementation of single dimensional arrays.
Exercise 1
Permit two dimensional arrays like:
int a[10][10];
n x n
matrices. Exercise 2
Permit pointer type
variables as in the following declaration as in the C programming language.
decl
int x, *p;
str p, *q;
enddecl
p=&x;
and q=&p;
, pointer variables may also be permitted in expressions like *p=*q+1;
for referring to the data pointed to, as permitted in the C programming language. Semantic rules as in the C programming language may be assumed.
Note
Right now, you are not equipped to do dynamic memory allocation for pointer variables (as done by the malloc()
function of C). Hence, a pointer type variable can be used as a pointer to another statically declared variable of the corresponding type. Dynamic memory allocation will be discussed in later stages.
Test Programs¶
Check your implementation with the following test cases :
-
Bubblesort (iterative)
This test program reads elements into an array and sorts them using the classic bubblesort algorithm. (iterative version)
Input : 1. Number of elements to be sorted from standard input. 2. Elements to be sorted
Output : A sorted array of elements.
To get the code for this test program click here.
-
Nth Fibonacci Number(iterative)
This test program prints the nth fibonacci number
Input : 1.An integer n Output : nth fibonacci number
To get the code for this test program click here.
-
Is Prime or Not
This program tests if a given integer is prime or not.
Input : 1.An integer n Output : Prime if n is prime else not a prime.
To get the code for this test program click here.
-
Sum of n factorials (iterative)
This program prints the sum to n factorial for a given n.
Input : 1.An integer n Output : sum of factorial of all integers 1 to n.
To get the code for this test program click here.