Abstract Syntax Tree¶
Introduction¶
The machine independent front-end phase of a compiler constructs an intermediate representation of the source program called the Abstract Syntax Tree (AST).This is followed by a machine dependent back-end that generates a target assembly language program.
Structure¶
The node structure of Abstract Syntax Tree is as follows:
struct ASTNode{
struct Typetable *type; //pointer to the type table entry
int nodetype; //node type information,eg:NODETYPE_WHILE,NODETYPE_PLUS,NODETYPE_STMT etc
char *name; //stores the variable/function name in case of variable/function nodes
union Constant value; //stores the value of the constant if the node corresponds to a constant
struct ASTNode *arglist; //pointer to the expression list given as arguments to a function call
struct ASTNode *ptr1,*ptr2,*ptr3; //Subtrees of the node. (Maximum Subtrees for IF THEN ELSE)
struct Gsymbol *Gentry; //pointer to GST entry for global variables and functions
struct Lsymbol *Lentry; //pointer to the function's LST for local variables and arguements
};
The union Constant is used to store the value of an integer or sting constant.
union Constant{
int intval;
char* strval;
};
Associated Methods¶
TreeCreate¶
struct ASTNode* TreeCreate(
struct Typetable *type,
int nodetype,
char *name,
union Constant value,
struct ASTNode *arglist,
struct ASTNode *ptr1,
struct ASTNode *ptr2,
struct ASTNode *ptr3
)
The type field of an AST node is designed to contain a pointer to the typetable entry corresponding to the type of the expression represented by the AST node (The type is set to void for statements). The type and nodetype fields of an AST node may be assigned values according to the following convension. If an expression evaluates to NULL, its type must be set to void.
Nodetype | Type | Description |
---|---|---|
CONST | int/str | For interger and string constants. As a special case, if the constant is NULL, the type is set to void. |
ID | int/str/ user-defined type | For all variable literals. |
PLUS/MINUS/MUL/DIV | int | For arithmetic operators '+', '-', '*', '/', '%'. 'ptr1' and 'ptr2' must be of type int and set to the AST's of the left and the right operands respectively. |
GT/LT/GE/LE | boolean | For relational operators '>', '<', '>=', '<='. 'ptr1' and 'ptr2' must be of type int and set to the AST's of the left and the right operands respectively. |
EQ/NE | boolean | For relational operator '==' or '!='. 'ptr1' and 'ptr2' must be set to AST of the left and the right operands respectively and both must be of the same type. |
IF | void | For the conditional construct 'if'. 'ptr1' must be set to the AST of the logical expression and must be of type 'boolean', 'ptr2' must be set to the AST of list of statements corresponding to the 'then part' and 'ptr3' must be set to the AST of list of statements corresponding to the 'else part'. |
WHILE | void | For conditional construct 'while'. 'ptr1' is set to the conditional logical expression and 'ptr2' is set to AST of list of statements under the body of while construct. |
READ | void | For input statement 'read', 'ptr1' must have nodetype ID or FIELD and type of 'ptr1' must be either 'int' or 'str'. |
WRITE | void | For output statement 'write', 'ptr1' must be of type 'int' and 'str' and must be set to AST of the expression whose value is to be written to the standard output. |
ASGN | void | For assignment statement (<var> = <expr>). 'ptr1' must be set to an AST of nodetype ID or FIELD and 'ptr2' must be set to an AST of expression whose value will be assigned to lvalue given by 'ptr1'. The types of the variable and the expression must match. |
SLIST | void | To form a tree with multiple statements. The execution semantics is that the sequence of statements represented by the left subtree 'ptr1' must be evaluated before evaluating 'ptr2'. |
BODY | int/str/ user-defined type | For body of a function, type indicates the return type of the function. This is created when the definition of a function is processed. |
RET | int/str/ user-defined type | For return statement of a function. |
FUNCTION | int/str/ user-defined type | For function calls. The type must be same as the return type of the function. The field 'arglist' must be set to list of arguments for the function. |
MAIN | int | For main function. |
FIELD | int/str/ user-defined type | For user-defined types,( to handle expressions of the form ID . ID, ID . ID . ID , etc). |
Illustration¶
Consider the following program
decl
int result,factorial(int n);
enddecl
int factorial(int n){
decl
int f;
enddecl
begin
if( n==1 || n==0 ) then
f = 1;
else
f = n * factorial(n-1);
endif;
return f;
end
}
int main(){
decl
int a;
enddecl
begin
read(a);
result = factorial(a);
write(result);
return 1;
end
}
-
Lets construct the abstract syntax tree step by step. The AST for conditional expression
n==1
(in line 9) will be as follows: -
Similarly we have AST fot
n==0
(in line 9) as follows. -
Next consider the complete conditional expression
n==1 || n==0
. -
Next we will form the AST for assignment statement
f = 1
(in line 10). -
Next, lets consider the statement
f = n * factorial(n-1)
which consists of arthimetic expressions with operands '-','*' and an assignment statement. AST forn-1
is as follows.AST for
n * factorial(n-1)
is as follows.AST for
f = n * factorial(n-1)
is as below. -
Following is the AST for the if condition.
-
The AST for return statement is as folows
-
Finally the AST for the factorial function will be as follows.