Label Translation

Introduction

Labels are generated when the compiler translates the high level program constructs such as if-then-else, while-do and function calls into target machine code. Labels are needed because the target address of a JUMP instruction may not be known at the time of generating code (Why?). However these labels must be replaced with target addresses in the final target code. Thus the compiler designer needs to write a program to replace all the labels in the program with the correct memory addresses eventually. This conversion is called Label translation.

The input to the label translator program is the machine code with labels (generated by codeGen()). The output is the target machine code with the labels replaced with corresponding addresses.

The main idea behind label translation is to execute the following two steps.

1. The input ( machine code with labels ) is parsed and a table in which the labels are mapped with their corresponding addresses is created.
2. Parse the input again and replace all the labels with their corresponding addresses by looking into the table created above.

*Note: In real systems, the compiler generates an object file which contains labels for functions as well as variables. The object file is processed by a separate software module called the linker which converts the object file into an excutable file. The advantage of this scheme is that a huge program could be divided into different stand alone source modules, each compiled seperately and finally linked together. A change in one module does not require recompilation of other modules and only the linking phase needs to be run again. In this project, we keep the linking task as a part of the compilation itself as ExpL does not allow a program to be compiled into several object modules.

In the ExpL project, variable addresses are resolved before reaching the Label translation phase by the compiler itself. Hence the labels remaining will correspond to
a) The label of the first instruction to be executed (MAIN:),
b) Labels placed at the beginning of assembly instructions to which there is some JMP/JZ/JNZ from somewhere.
c) Labels placed at the beginning of functions. (The compiler translates a call to the function with a CALL instruction with this label).

Illustration

The above procedure is demonstrated with the help of a program snippet.

The code generated after the codeGen() phase is shown in the figure 1 ( It consists of labels ).

Figure 1

Figure 2

In the machine code generated after codegen section, labels occur in two different ways.

1. Label declarations in which labels are followed by semicolon (:).
Examples : F0: is a label for the instruction at address 2066, L0: for 2102, L1: for 2162 and MAIN: for 2186 as in figure1
2. Instructions which contain labels in them namely, JMP, JZ, JNZ and CALL.
Examples : JMP L1 at address 2100, JZ R0,L3 at address 2248, CALL MAIN at address 2062 etc., as in figure1.

Now, the label translation procedure can be explained with the help of an example label F0 from the figure1.

First "F0:" needs to be recognized as a label and the memory address of the label occurence need to be stored in a table called the Label-Address Table. In the above example, the address corresponding to F0: is 2066. We need to parse the entire machine code to identify all the labels in the program and store all the (label, address) pairs in the Label-Address table. We can remove the label F0: (and other labels) from the program once the corresponding (label,address) pair is entered into the label-address table.

The next task is to replace labels occuring in instructions with the addresses of the labels from the label-address table. Refering to the figure1, there are two instructions that use the label F0: CALL F0 occuring at address 2136 and CALL F0 occuring at address 2302. We will replace the labels in these instructions with the address of F0 (2066) by looking up the label-address table. Thus, after translation, both these instructions will translate to CALL 2066 as shown in figure2.

The label translator program needs to parse the entire machine code two times. In the first parse we identify all the label declarations and the memory addresses in the label-address table. The table constructed after the first parse of the above code is shown below.

In the second parse, we remove the label declarations and replace the labels in instructions like JUMP, CALL etc. with the corresponding memory address from the label-address table.

The target code after the label translation is shown in the figure 2.

Implementing Label Translation

As noted previously, the input to the label translator program is the machine code with labels (generated by codeGen()). The output is the target machine code with the labels replaced with corresponding addresses. It is easy to design a simple lex program which does two passes on the input file and perform label translation.

The first pass must collect all the label declarations and simultaneously keep track of the addresses of the instructions corresponding to the labels. Since the pattern "letter.(letter|digit)*:" identifies a label, it is easy to design a LEX rule to recognize all label declarations in the program. The compiler must ensure that label names are different from instructions, to avoid confusion.

Since each XSM instruction takes two words of storage, and the first instruction is at address 2056, the first pass can track the address corresponding to each label declaration as 2056 + 2 *(InstructionNumber-1).

The program can maintain a simple linked list into which each label address pair found is entered. The label declarations can be removed from the input program, once it's table entry is made.

In the second pass, the program must "search and replace" each label occuring in the control transfer instructions (JMP, JZ, JNZ and CALL) with the corresponding address in the label table. Since the instruction set is fixed and labels are lexically different from instructions, a simple set of LEX rules can identify all labels and perform the replacement.

We leave the implementation details to you.

Question :
What is the difficulty in implementing the label translation using single pass algorithm?