Label Translation (Linking*)¶

Introduction¶

Labels are generated when the compiler translates the high level program constructs such as if-then-else, while-do and function calls into target machine code. Labels are needed because the target address of a JUMP instruction may not be known at the time of generating code (Why?). However these labels must be replaced with target addresses in the final target code. Thus the compiler designer needs to write a program to replace all the labels in the program with the correct memory addresses eventually. This conversion is called Label translation.

The input to the label translator program is the machine code with labels (generated by codeGen()). The output is the target machine code with the labels replaced with corresponding addresses.

The main idea behind label translation is to execute the following two steps.

The input ( machine code with labels ) is parsed and a table in which the labels are mapped with their corresponding addresses is created.
Parse the input again and replace all the labels with their corresponding addresses by looking into the table created above.

Note

In real systems, the compiler generates an object file which contains labels for functions as well as variables. The object file is processed by a separate software module called the linker which converts the object file into an excutable file. The advantage of this scheme is that a huge program could be divided into different stand alone source modules, each compiled seperately and finally linked together. A change in one module does not require recompilation of other modules and only the linking phase needs to be run again. In this project, we keep the linking task as a part of the compilation itself as ExpL does not allow a program to be compiled into several object modules.

In the ExpL project, variable addresses are resolved before reaching the Label translation phase by the compiler itself. Hence the labels remaining will correspond to

The label of the first instruction to be executed (MAIN:),
Labels placed at the beginning of assembly instructions to which there is some JMP/JZ/JNZ from somewhere.
Labels placed at the beginning of functions. (The compiler translates a call to the function with a CALL instruction with this label).

Illustration¶

The above procedure is demonstrated with the help of a program snippet.

decl
 int fact(int n);
enddecl

int fact(int n)
{
 decl
  int f;
 enddecl
 begin
  if(n<=1) then
   f=1;
  else
   f=n*fact(n-1);
  endif;
  return f;
 end
}

int main()
{
 decl
  int n,m,res;
 enddecl
 begin
  read(n);
  while( n >= 1 ) do
   read(m);
   res = fact(m);
   write(res);
   n = n-1;
  endwhile;
  return 0;
 end
}

The code generated after the codeGen() phase is shown in the figure 1 ( It consists of labels ).

Figure 1: Assembly Code Before Label Translation

        
        
2048 : 0
2049 : 2056
2050 : 0
2051 : 0
2052 : 0
2053 : 0
2054 : 0
2055 : 0
2056 : MOV SP,4095
2058 : MOV BP,4096
2060 : PUSH R0
2062 : CALL MAIN
2064 : INT 10
       F0:
2066 : PUSH BP
2068 : MOV BP,SP
2070 : PUSH R0
2072 : MOV R1,BP
2074 : MOV R2,2
2076 : SUB R1,R2
2078 : MOV R2,1
2080 : SUB R1,R2
2082 : MOV R0,[R1]
2084 : MOV R1,1
2086 : LE R0,R1
2088 : JZ R0,L0
2090 : MOV R0,1
2092 : MOV R2,BP
2094 : MOV R1,1
2096 : ADD R2,R1
2098 : MOV [R2],R0
2100 : JMP L1
       L0:
2102 : MOV R1,BP
2104 : MOV R2,2
2106 : SUB R1,R2
2108 : MOV R2,1
2110 : SUB R1,R2
2112 : MOV R0,[R1]
2114 : PUSH R0
2116 : MOV R1,BP
2118 : MOV R2,2
2120 : SUB R1,R2
2122 : MOV R2,1
2124 : SUB R1,R2
2126 : MOV R0,[R1]
2128 : MOV R1,1
2130 : SUB R0,R1
2132 : PUSH R0
2134 : PUSH R0
2136 : CALL F0
2138 : POP R0
2140 : POP R0
2142 : POP R0
2144 : MOV R1,3
2146 : MOV R2,SP
2148 : ADD R2,R1
2150 : MOV R1,[R2]
2152 : MUL R0,R1
2154 : MOV R2,BP
2156 : MOV R1,1
2158 : ADD R2,R1
2160 : MOV [R2],R0
       L1:
2162 : MOV R1,BP
2164 : MOV R0,1
2166 : ADD R1,R0
2168 : MOV R0,[R1]
2170 : MOV R1,BP
2172 : MOV R2,2
2174 : SUB R1,R2
2176 : MOV [R1],R0
2178 : POP R0
2180 : MOV BP,[SP]
2182 : POP R0
2184 : RET
       MAIN:
2186 : PUSH BP
2188 : MOV BP,SP
2190 : PUSH R0
2192 : PUSH R0
2194 : PUSH R0
2196 : MOV R1,BP
2198 : MOV R0,1
2200 : ADD R1,R0
2202 : PUSH R0
2204 : PUSH R1
2206 : MOV R0,"Read"
2208 : PUSH R0
2210 : MOV R0,-1
2212 : PUSH R0
2214 : PUSH R1
2216 : PUSH R0
2218 : PUSH R0
2220 : CALL 0
2222 : POP R0
2224 : POP R0
2226 : POP R0
2228 : POP R0
2230 : POP R0
2232 : POP R0
2234 : POP R0
       L2:
2236 : MOV R1,BP
2238 : MOV R0,1
2240 : ADD R1,R0
2242 : MOV R0,[R1]
2244 : MOV R1,1
2246 : GE R0,R1
2248 : JZ R0,L3
2250 : MOV R1,BP
2252 : MOV R0,2
2254 : ADD R1,R0
2256 : PUSH R0
2258 : PUSH R1
2260 : MOV R0,"Read"
2262 : PUSH R0
2264 : MOV R0,-1
2266 : PUSH R0
2268 : PUSH R1
2270 : PUSH R0
2272 : PUSH R0
2274 : CALL 0
2276 : POP R0
2278 : POP R0
2280 : POP R0
2282 : POP R0
2284 : POP R0
2286 : POP R0
2288 : POP R0
2290 : MOV R1,BP
2292 : MOV R0,2
2294 : ADD R1,R0
2296 : MOV R0,[R1]
2298 : PUSH R0
2300 : PUSH R0
2302 : CALL F0
2304 : POP R0
2306 : POP R0
2308 : MOV R0,2
2310 : MOV R1,SP
2312 : ADD R1,R0
2314 : MOV R0,[R1]
2316 : MOV R2,BP
2318 : MOV R1,3
2320 : ADD R2,R1
2322 : MOV [R2],R0
2324 : MOV R1,BP
2326 : MOV R0,3
2328 : ADD R1,R0
2330 : MOV R0,[R1]
2332 : MOV [2042],R0
2334 : PUSH R0
2336 : MOV R0,"Write"
2338 : PUSH R0
2340 : MOV R0,-2
2342 : PUSH R0
2344 : MOV R0,2042
2346 : PUSH R0
2348 : PUSH R0
2350 : PUSH R0
2352 : CALL 0
2354 : POP R0
2356 : POP R0
2358 : POP R0
2360 : POP R0
2362 : POP R0
2364 : POP R0
2366 : MOV R1,BP
2368 : MOV R0,1
2370 : ADD R1,R0
2372 : MOV R0,[R1]
2374 : MOV R1,1
2376 : SUB R0,R1
2378 : MOV R2,BP
2380 : MOV R1,1
2382 : ADD R2,R1
2384 : MOV [R2],R0
2386 : JMP L2
       L3:
2388 : MOV R0,0
2390 : MOV R1,BP
2392 : MOV R2,2
2394 : SUB R1,R2
2396 : MOV [R1],R0
2398 : POP R0
2400 : POP R0
2402 : POP R0
2404 : MOV BP,[SP]
2406 : POP R0
2408 : RET

Figure 2: Assembly Code after Label Translation

        
        
2048 : 0
2049 : 2056
2050 : 0
2051 : 0
2052 : 0
2053 : 0
2054 : 0
2055 : 0
2056 : MOV SP,4095
2058 : MOV BP,4096
2060 : PUSH R0
2062 : CALL 2186
2064 : INT 10
2066 : PUSH BP
2068 : MOV BP,SP
2070 : PUSH R0
2072 : MOV R1,BP
2074 : MOV R2,2
2076 : SUB R1,R2
2078 : MOV R2,1
2080 : SUB R1,R2
2082 : MOV R0,[R1]
2084 : MOV R1,1
2086 : LE R0,R1
2088 : JZ R0,2102
2090 : MOV R0,1
2092 : MOV R2,BP
2094 : MOV R1,1
2096 : ADD R2,R1
2098 : MOV [R2],R0
2100 : JMP 2162
2102 : MOV R1,BP
2104 : MOV R2,2
2106 : SUB R1,R2
2108 : MOV R2,1
2110 : SUB R1,R2
2112 : MOV R0,[R1]
2114 : PUSH R0
2116 : MOV R1,BP
2118 : MOV R2,2
2120 : SUB R1,R2
2122 : MOV R2,1
2124 : SUB R1,R2
2126 : MOV R0,[R1]
2128 : MOV R1,1
2130 : SUB R0,R1
2132 : PUSH R0
2134 : PUSH R0
2136 : CALL 2066
2138 : POP R0
2140 : POP R0
2142 : POP R0
2144 : MOV R1,3
2146 : MOV R2,SP
2148 : ADD R2,R1
2150 : MOV R1,[R2]
2152 : MUL R0,R1
2154 : MOV R2,BP
2156 : MOV R1,1
2158 : ADD R2,R1
2160 : MOV [R2],R0
2162 : MOV R1,BP
2164 : MOV R0,1
2166 : ADD R1,R0
2168 : MOV R0,[R1]
2170 : MOV R1,BP
2172 : MOV R2,2
2174 : SUB R1,R2
2176 : MOV [R1],R0
2178 : POP R0
2180 : MOV BP,[SP]
2182 : POP R0
2184 : RET
2186 : PUSH BP
2188 : MOV BP,SP
2190 : PUSH R0
2192 : PUSH R0
2194 : PUSH R0
2196 : MOV R1,BP
2198 : MOV R0,1
2200 : ADD R1,R0
2202 : PUSH R0
2204 : PUSH R1
2206 : MOV R0,"Read"
2208 : PUSH R0
2210 : MOV R0,-1
2212 : PUSH R0
2214 : PUSH R1
2216 : PUSH R0
2218 : PUSH R0
2220 : CALL 0
2222 : POP R0
2224 : POP R0
2226 : POP R0
2228 : POP R0
2230 : POP R0
2232 : POP R0
2234 : POP R0
2236 : MOV R1,BP
2238 : MOV R0,1
2240 : ADD R1,R0
2242 : MOV R0,[R1]
2244 : MOV R1,1
2246 : GE R0,R1
2248 : JZ R0,2388
2250 : MOV R1,BP
2252 : MOV R0,2
2254 : ADD R1,R0
2256 : PUSH R0
2258 : PUSH R1
2260 : MOV R0,"Read"
2262 : PUSH R0
2264 : MOV R0,-1
2266 : PUSH R0
2268 : PUSH R1
2270 : PUSH R0
2272 : PUSH R0
2274 : CALL 0
2276 : POP R0
2278 : POP R0
2280 : POP R0
2282 : POP R0
2284 : POP R0
2286 : POP R0
2288 : POP R0
2290 : MOV R1,BP
2292 : MOV R0,2
2294 : ADD R1,R0
2296 : MOV R0,[R1]
2298 : PUSH R0
2300 : PUSH R0
2302 : CALL 2066
2304 : POP R0
2306 : POP R0
2308 : MOV R0,2
2310 : MOV R1,SP
2312 : ADD R1,R0
2314 : MOV R0,[R1]
2316 : MOV R2,BP
2318 : MOV R1,3
2320 : ADD R2,R1
2322 : MOV [R2],R0
2324 : MOV R1,BP
2326 : MOV R0,3
2328 : ADD R1,R0
2330 : MOV R0,[R1]
2332 : MOV [2042],R0
2334 : PUSH R0
2336 : MOV R0,"Write"
2338 : PUSH R0
2340 : MOV R0,-2
2342 : PUSH R0
2344 : MOV R0,2042
2346 : PUSH R0
2348 : PUSH R0
2350 : PUSH R0
2352 : CALL 0
2354 : POP R0
2356 : POP R0
2358 : POP R0
2360 : POP R0
2362 : POP R0
2364 : POP R0
2366 : MOV R1,BP
2368 : MOV R0,1
2370 : ADD R1,R0
2372 : MOV R0,[R1]
2374 : MOV R1,1
2376 : SUB R0,R1
2378 : MOV R2,BP
2380 : MOV R1,1
2382 : ADD R2,R1
2384 : MOV [R2],R0
2386 : JMP 2236
2388 : MOV R0,0
2390 : MOV R1,BP
2392 : MOV R2,2
2394 : SUB R1,R2
2396 : MOV [R1],R0
2398 : POP R0
2400 : POP R0
2402 : POP R0
2404 : MOV BP,[SP]
2406 : POP R0
2408 : RET

In the machine code generated after codegen section, labels occur in two different ways.

Label declarations in which labels are followed by semicolon (:).

Examples : F0: is a label for the instruction at address 2066, L0: for 2102, L1: for 2162 and MAIN: for 2186 as in figure1
Instructions which contain labels in them namely, JMP, JZ, JNZ and CALL.

Examples : JMP L1 at address 2100, JZ R0,L3 at address 2248, CALL MAIN at address 2062 etc., as in figure1.

Now, the label translation procedure can be explained with the help of an example label F0 from the figure1.

First "F0:" needs to be recognized as a label and the memory address of the label occurence need to be stored in a table called the Label-Address Table. In the above example, the address corresponding to F0: is 2066. We need to parse the entire machine code to identify all the labels in the program and store all the (label, address) pairs in the Label-Address table. We can remove the label F0: (and other labels) from the program once the corresponding (label,address) pair is entered into the label-address table.

The next task is to replace labels occuring in instructions with the addresses of the labels from the label-address table. Refering to the figure1, there are two instructions that use the label F0: CALL F0 occuring at address 2136 and CALL F0 occuring at address 2302. We will replace the labels in these instructions with the address of F0 (2066) by looking up the label-address table. Thus, after translation, both these instructions will translate to CALL 2066 as shown in figure2.

The label translator program needs to parse the entire machine code two times. In the first parse we identify all the label declarations and the memory addresses in the label-address table. The table constructed after the first parse of the above code is shown below.

In the second parse, we remove the label declarations and replace the labels in instructions like JUMP, CALL etc. with the corresponding memory address from the label-address table.

The target code after the label translation is shown in the figure 2.

Implementing Label Translation¶

As noted previously, the input to the label translator program is the machine code with labels (generated by codeGen()). The output is the target machine code with the labels replaced with corresponding addresses. It is easy to design a simple lex program which does two passes on the input file and perform label translation.

The first pass must collect all the label declarations and simultaneously keep track of the addresses of the instructions corresponding to the labels. Since the pattern letter(letter|digit)*: identifies a label, it is easy to design a LEX rule to recognize all label declarations in the program. The compiler must ensure that label names are different from instructions, to avoid confusion.

Since each XSM instruction takes two words of storage, and the first instruction is at address 2056, the first pass can track the address corresponding to each label declaration as 2056 + 2 *(InstructionNumber-1).

The program can maintain a simple linked list into which each label address pair found is entered. The label declarations can be removed from the input program, once it's table entry is made.

In the second pass, the program must "search and replace" each label occuring in the control transfer instructions (JMP, JZ, JNZ and CALL) with the corresponding address in the label table. Since the instruction set is fixed and labels are lexically different from instructions, a simple set of LEX rules can identify all labels and perform the replacement.

We leave the implementation details to you.

Question

What is the difficulty in implementing the label translation using single pass algorithm?