ATTRIBUTE SYNTHESIS¶
Introduction to attributes¶
In the last section of the YACC documentation we noted that it is possible to pass values associated with tokens from yylex() to yyparse(). We had described the term 'attribute' as a value associated with a token. YACC uses yylval to facilitate passing values from the lexical analyzer to the parser. In this document we will explore attribute values for non-terminals and how these attributes are handled by YACC internally. We will also be exploring the usage of YYSTYPE to define custom attribute types.
yylval is a global variable of the default return type YYSTYPE declared by YACC in y.tab.c. In the YACC documentation, we had seen an example which illustrates the passing of attributes from yylex() to yyparse(). If the programmer were to use LEX to generate yylex(), then the attributes will have to be passed to yyparse() in a similar fashion i.e, using yylval (as shown below). In the LEX program, the yylex() simply returns the token by its name while the associated attribute is assigned to yylval which can be accessed by yyparse(). Note that, for LEX to return a token, the token must be declared in the declarations section of the YACC program. The following example is a LEX program which returns a token DIGIT when it finds a string matching the "number" pattern.
%{
#include "y.tab.h"
#include<stdlib.h\>
#include<stdio.h\>
%}
number \[0-9\]+
%%
{number} {
yylval = atoi(yytext);
return DIGIT;
}
. return \*yytext;
%%
In this example, we want to return the token DIGIT when an integer is found in the input stream. In addition to the token, we need to pass the value found in the input stream to yyparse(). The lexeme found in the input stream is a string which contains the integer found. atoi() is a built-in function of the type int defined in the stdlib.h header file. We use atoi() to obtain the integer equivalent of the lexeme found. The obtained integer value is then assigned to yylval.
The following code segment is a YACC program which declares the token DIGIT. Recall that the following program must be run with a -d flag to the yacc command. This flag is used to create the y.tab.h file which facilitates a one way communication from YACC to LEX i.e., it is used to communicate the tokens declared in the YACC program to the LEX program. Let us refer to the task of LEX obtaining the token declarations from YACC as LEX importing the declarations. The y.tab.h file contains declarations of all the tokens in the YACC program. Note that the LEX program above includes the y.tab.h file in the auxiliary declarations section to import the declarations.
%{
#include <stdio.h\>
%}
%token DIGIT
%%
start : expr '\\n' {printf("\\nComplete");exit(1);}
;
expr: expr '+' expr {printf("+ ");}
| expr '\*' expr {printf("\* ");}
| '(' expr ')'
| DIGIT {printf("%d ",$1);}
;
%%
yyerror()
{
printf("Error");
}
main()
{
yyparse();
return 1;
}
Non-terminal attributes¶
In addition to values associated with terminal symbols, YACC also allows values to be associated with non-terminals. We extend our notion of an attribute to: "An attribute is a value associated with a terminal or non-terminal grammar symbol". The attribute of a grammar symbol in a production can be referred to in the actions section of a rule using a special YACC syntax: $1 for the first symbol in the body of a production, $2 for the second symbol, $3 for the third and so on. For example consider the following example of a YACC rule.
X: B C D
Consider the problem of displaying two numbers in an input stream if they occur as a pair separated by a comma. Also suppose that the numbers must be displayed ONLY after a pair is found. Let us look at a YACC program that solves the problem.
pair: number ',' number { printf("pair(%d,%d),$1,$3"); }
;
number: DIGIT { $$=$1; }
;
In the program segment, the first rule displays the value of the 'number' symbols when found as a pair in the input stream. The action of the rule refers to the value of the first number symbol as $1 and and the second number as $3. (Note that $2 refers to the literal token ',' which does not have any value associated with it). Since 'number' is a non-terminal, its attribute cannot be set by yylex(). Recall that every non-terminal symbol in the CFG must have a corresponding production with the non-terminal as the head. The attribute value of a non-terminal is set by the action of the rule which contains the corresponding production. In the example, the value of the non-terminal 'number' is set by the rule:
number: DIGIT { $$=$1; }
The action of the rule sets the value of 'number' (reffered to using $$) to the value of DIGIT (referred to using $1).
Sample I/O:
I: 2,5
O: pair(2,5)
I: 3,5,7
O: syntax error
Attribute Stack¶
YACC maintains a parse stack to achieve shift-reduce parsing. The parse stack contains grammar symbols (both terminal and non-terminal symbols) representing the current configuration of the parser. Similar to the parse stack, YACC also maintains an attribute stack to maintain the values of the grammar symbols in the parse stack. The attribute stack is synchronous with the parse stack. Synchronous because the i'th value on the attribute stack will be the attribute of the i'th symbol on the parse stack.
Recall that when a token has been identified by LEX in the input stream, it stores the token's value in yylval. When a token is shifted to the parse stack, the value of yylval is pushed on to the attribute stack. Similarly, during a reduction when a non-terminal replaces a production handle on top of the stack, the attributes of the grammar symbols of the handle are popped from the stack and the attribute value of the non-terminal (head of the production) is pushed on to the attribute stack. The following example is a complete YACC program that evaluates an expression.
%{
#include <stdio.h\>
int result;
%}
%token DIGIT
%left '+'
%left '\*'
%%
start : expr '\\n' {result = $1; exit(1);}
;
expr: expr '+' expr {$$ = $1 + $2;}
| expr '\*' expr {$$ = $1 \* $3;}
| '(' expr ')' {$$ = $2;}
| DIGIT {$$ = $1;}
;
%%
yyerror()
{
printf("Error");
}
main()
{
yyparse();
printf("Expression value = %d",result);
return 1;
}
Sample I/O:
I: 2+3\*(4+5)
O: 29
Consider the two productions from the example.
expr: expr '+' expr {$$ = $1 + $2;}
expr: DIGIT {$$ = $1;}
When the input 2+3*(4+5) is fed to the parser, yylex() reads the first character and finds a matching token DIGIT. yylex() returns DIGIT and asigns the value 2 to yylval. When yylex() has returned, yyparse() pushes DIGIT to the parse stack and the value of yylval to the attribute stack. $1, $2, $3 etc., refers to the attribute values on top of the stack before a reduction takes place. $$ refers to the attribute value of the non-terminal which is the head of the production which contains the handle. When the non-terminal is pushed on to the parse stack, the value of $$ is pushed on to the attribute stack. $$ refers to the symbol on top of the stack after a reduction has taken place. The following table shows the configuration of the parse stack and the attribute stack at every step of the parsing process. Assume that whenever yylex() returns a token with no attribute, yyparse() pushes a '.' to the attribute stack.
PARSE STACK | ATTRIBUTE STACK | I/P BUFFER | PARSER-ACTION EXECUTED |
---|---|---|---|
2 + 3 *(4 + 5) $ | _ | ||
DIGIT | 2 | + 3* ( 4 + 5 ) $ | SHIFT |
expr | 2 | + 3 *( 4 + 5 ) $ | REDUCE |
expr + | 2 . | 3* ( 4 + 5 ) $ | SHIFT |
expr + DIGIT | 2 . 3 | *( 4 + 5 ) $ | SHIFT |
expr + expr | 2 . 3 | * ( 4 + 5) $ | REDUCE |
expr + expr * | 2 . 3 . | ( 4 + 5 ) $ | SHIFT |
expr + expr* ( | 2 . 3 . . | 4 + 5 ) $ | SHIFT |
expr + expr *( DIGIT | 2 . 3 . . 4 | + 5 ) $ | SHIFT |
expr + expr* ( expr | 2 . 3 . . 4 | + 5 ) $ | REDUCE |
expr + expr *( expr + | 2 . 3 . . 4 . | 5 ) $ | SHIFT |
expr + expr* ( expr + DIGIT | 2 . 3 . . 4 . 5 | ) $ | SHIFT |
expr + expr *( expr + expr | 2 . 3 . . 4 . 5 | ) $ | REDUCE |
expr + expr* ( expr | 2 . 3 . . 9 | ) $ | REDUCE |
expr + expr *( expr ) | 2 . 3 . . 9 . | $ | SHIFT |
expr + expr* expr | 2 . 3 . 9 | $ | REDUCE |
expr + expr | 2 . 27 | $ | REDUCE |
expr | 29 | $ | REDUCE |
$expr | 29 | $ | ACCEPT |
YYSTYPE¶
Attributes of terminals can be passed from yylex() to yyparse() and attributes of a non-terminal can be synthesized. An attribute of a non-terminal grammar symbol is said to synthesized if it has been calculated from the attribute values of it's children in the parse tree. A synthesized attribute is an attribute of a non-terminal than has been calculated using the attribute values of the symbols in the handle that it replaces. An example of synthesized attribute:
Z: X { printf("Result=%d",$1);}
X: A '+' B { $$ = $1 + $3; }
The attribute value of X is a synthesized attribute as it has been calculated using the attribute values of the symbols in the handle that it replaces. The attribute stack consists of attributes of token and synthesized attributes. The attribute stack is of the type YYSTYPE i.e, all the stack variables are of the type YYSTYPE. For example, in the above production, $$,$1 and $3 are all of the type YYSTYPE. YYSTYPE is a YACC-defined data type. yylval is declared by YACC to be of the type YYSTYPE. By default YACC defines YYSTYPE to be the type int. It means that, by default only integer valued attributes can be passed from yylex() to yyparse() and only integer attributes can be synthesized. If we were to attempt to assign any other value to yylval or any of the attribute stack variables, a type error would be flagged on compiling y.tab.c using gcc.
This default type definition of YYSTYPE can overriden with any built-in or userdefined data type. For example if we wanted to print the prefix form of an expression:
expr: expr OP expr { printf("%c %c %c",$2,$1,$3);}
The type of YYSTYPE can be overriden manually as shown below. This line has to be added to the declarations section of the YACC program. This may be used (not recommended) to change the type of all the attributes from int to some other type.
typedef char YYSTYPE;
But inorder to have multiple custom attribute values, YACC offers a useful feature called %union to customize the type of YYSTYPE. %union is useful when we require to have different tokens of different types. For example if we wanted some tokens to be of the type int and some tokens to be of the type char. The following code segment can be added to declarations section of the YACC program to achieve that.
/* YACC Auxiliary declarations*/
/* YACC Declarations*/
%union
{
char character;
int integer;
};
%token <character> OP
%token <integer> NUMBER
%%
expr: expr OP expr { printf("%c %d %d",$2,$1,$3);}
| DIGIT {$$=$1;}
;
%%
/* Auxiliary functions */
Note that the type of the attribute of each token must be mentioned when the token is being declared using the following syntax.
%token <type> tokenname
type
must be declared under %union prior to use in the declaration of a token. If the type of a token is not explicitly mentioned, no attribute value can be assigned to the token i.e, it is assumed to be of type void.
References¶
For further details on the topics covered in this document, the reader may refer to the following :
- http://epaperpress.com/lexandyacc/pry1.html
- http://www.sci.brooklyn.cuny.edu/~zhou/teaching/cis707/attr/tsld003.htm
- https://cs.nyu.edu/~gottlieb/courses/2000s/2008-09-fall/compilers/lectures/lecture-08.html
- http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
- Compilers : Principles,Techniques and Tools by Alfred V.Aho, Monica S. Lam, Ravi Sethi and Jeffrey D.Ulman .
- Modern Compiler Implementation in C by Andrew W.Appel
- Flex & Bison by John Levine
- http://dinosaur.compilertools.net/