ir.md - scc - simple c99 compiler
(HTM) git clone git://git.simple-cc.org/scc
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) Submodules
(DIR) README
(DIR) LICENSE
---
ir.md (8892B)
---
1 # scc intermediate representation #
2
3 The scc IR tries to be be a simple and easily parseable intermediate
4 representation, and it makes it a bit terse and cryptic. The main
5 characteristic of the IR is that all the types and operations are
6 represented with only one letter, so parsing tables can be used
7 to parse it.
8
9 The language is composed of lines, representing statements.
10 Each statement is composed of tab-separated fields.
11 Declaration statements begin in column 0, expressions and
12 control flow begin with a tabulator.
13 When the frontend detects an error, it closes the output stream.
14
15 ## Types ##
16
17 Types are represented with uppercase letters:
18
19 * C -- signed 8-Bit integer
20 * I -- signed 16-Bit integer
21 * W -- signed 32-Bit integer
22 * Q -- signed 64-Bit integer
23 * K -- unsigned 8-Bit integer
24 * N -- unsigned 16-Bit integer
25 * Z -- unsigned 32-Bit integer
26 * O -- unsigned 64-Bit integer
27 * 0 -- void
28 * P -- pointer
29 * F -- function
30 * V -- vector
31 * U -- union
32 * S -- struct
33 * B -- bool
34 * J -- float
35 * D -- double
36 * H -- long double
37
38 This list has been built for the original Z80 backend, where 'int'
39 has the same size as 'short'. Several types (S, F, V, U and others) need
40 an identifier after the type letter for better differentiation
41 between multiple structs, functions, vectors and unions (S1, V12 ...)
42 naturally occuring in a C-program.
43
44 ## Storage classes ##
45
46 The storage classes are represented using uppercase letters:
47
48 * A -- automatic
49 * R -- register
50 * G -- public (global variable declared in the module)
51 * X -- extern (global variable declared in another module)
52 * Y -- private (variable in file-scope)
53 * T -- local (static variable in function-scope)
54 * M -- member (struct/union member)
55 * L -- label
56
57 ## Declarations/definitions ##
58
59 Variable names are composed of a storage class and an identifier
60 (e.g. A1, R2, T3).
61 Declarations and definitions are composed of a variable
62 name, a type and the name of the variable:
63
64 A1 I maxweight
65 R2 C flag
66 A3 S4 statstruct
67
68 ### Type declarations ###
69
70 Some declarations (e.g. structs) involve the declaration of member
71 variables.
72 Struct members are declared normally after the type declaration in
73 parentheses.
74
75 For example the struct declaration
76
77 struct foo {
78 int i;
79 long c;
80 } var1;
81
82 generates
83
84 S2 foo (
85 M3 I i
86 M4 W c
87 )
88 G5 S2 var1
89
90 ## Functions ##
91
92 A function prototype
93
94 int printf(char *cmd, int flag, void *data);
95
96 will generate a type declaration and a variable declaration
97
98 F5 P I P
99 X1 F5 printf
100
101 The first line gives the function-type specification 'F' with
102 an identifier '5' and subsequently lists the types of the
103 function parameters.
104 The second line declares the 'printf' function as a publicly
105 scoped variable.
106
107 Analogously, a statically declared function in file scope
108
109 static int printf(char *cmd, int flag, void *data);
110
111 generates
112
113 F5 P I P
114 T1 F5 printf
115
116 Thus, the 'printf' variable went into local scope ('T').
117
118 A '{' in the first column starts the body of the previously
119 declared function:
120
121 int printf(char *cmd, int flag, void *data) {}
122
123 generates
124
125 F5 P I P
126 G1 F5 printf
127 {
128 A2 P cmd
129 A3 I flag
130 A4 P data
131 -
132 }
133
134 Again, the frontend must ensure that '{' appears only after the
135 declaration of a function. The character '-' marks the separation
136 between parameters and local variables:
137
138 int printf(register char *cmd, int flag, void *data) {int i;};
139
140 generates
141
142 F5 P I P
143 G1 F5 printf
144 {
145 R2 P cmd
146 A3 I flag
147 A4 P data
148 -
149 A6 I i
150 }
151
152 ### Expressions ###
153
154 Expressions are emitted in reverse polish notation, simplifying
155 parsing and converting into a tree representation.
156
157 #### Operators ####
158
159 Operators allowed in expressions are:
160
161 * \+ -- addition
162 * \- -- substraction
163 * \* -- multiplication
164 * % -- modulo
165 * / -- division
166 * l -- left shift
167 * r -- right shift
168 * < -- less than
169 * > -- greather than
170 * ] -- greather or equal than
171 * [ -- less or equal than
172 * = -- equal than
173 * ! -- different than
174 * & -- bitwise and
175 * | -- bitwise or
176 * ^ -- bitwise xor
177 * ~ -- bitwise complement
178 * : -- asignation
179 * _ -- unary negation
180 * c -- function call
181 * p -- parameter
182 * . -- field
183 * , -- comma operator
184 * ? -- ternary operator
185 * ' -- take address
186 * a -- logical shortcut and
187 * o -- logical shortcut or
188 * @ -- content of pointer
189
190 Assignation has some suboperators:
191
192 * :/ -- divide and assign
193 * :% -- modulo and assign
194 * :+ -- addition and assign
195 * :- -- substraction and assign
196 * :l -- left shift and assign
197 * :r -- right shift and assign
198 * :& -- bitwise and and assign
199 * :^ -- bitwise xor and assign
200 * :| -- bitwise or and assign
201 * :i -- post increment
202 * :d -- post decrement
203
204 Every operator in an expression has a type descriptor.
205
206 #### Constants ####
207
208 Constants are introduced with the character '#'. For instance, 10 is
209 translated to #IA (all constants are emitted in hexadecimal),
210 where I indicates that it is an integer constant.
211 Strings are a special case because they are represented with
212 the " character.
213 The constant "hello" is emitted as "68656C6C6F. For example
214
215 int
216 main(void)
217 {
218 int i, j;
219
220 i = j+2*3;
221 }
222
223 generates
224
225 F1
226 G1 F1 main
227 {
228 -
229 A2 I i
230 A3 I j
231 A2 A3 #I6 +I :I
232 }
233
234 Type casts are expressed with a tuple denoting the
235 type conversion
236
237 int
238 main(void)
239 {
240 int i;
241 long j;
242
243 j = (long)i;
244 }
245
246 generates
247
248 F1
249 G1 F1 main
250 {
251 -
252 A2 I i
253 A3 W j
254 A2 A3 WI :I
255 }
256
257 ### Statements ###
258 #### Jumps #####
259
260 Jumps have the following form:
261
262 j L# [expression]
263
264 the optional expression field indicates some condition which
265 must be satisfied to jump. Example:
266
267 int
268 main(void)
269 {
270 int i;
271
272 goto label;
273 label:
274 i -= i;
275 }
276
277 generates
278
279 F1
280 G1 F1 main
281 {
282 -
283 A2 I i
284 j L3
285 L3
286 A2 A2 :-I
287 }
288
289 Another form of jump is the return statement, which uses the
290 letter 'y' followed by a type identifier.
291 Depending on the type, an optional expression follows.
292
293 int
294 main(void)
295 {
296 return 16;
297 }
298
299 generates
300
301 F1
302 G1 F1 main
303 {
304 -
305 yI #I10
306 }
307
308
309 #### Loops ####
310
311 There are two special characters that are used to indicate
312 to the backend that the following statements are part of
313 a loop body.
314
315 * b -- beginning of loop
316 * e -- end of loop
317
318 #### Switch statement ####
319
320 Switches are represented using a table, in which the labels
321 where to jump for each case are indicated. Common cases are
322 represented with 'v' and default with 'f'.
323 The switch statement itself is represented with 's' followed
324 by the label where the jump table is located, and the
325 expression of the switch:
326
327 int
328 func(int n)
329 {
330 switch (n+1) {
331 case 1:
332 case 2:
333 case 3:
334 default:
335 ++n;
336 }
337 }
338
339 generates
340
341 F2 I
342 G1 F2 func
343 {
344 A1 I n
345 -
346 s L4 A1 #I1 +I
347 L5
348 L6
349 L7
350 L8
351 A1 #I1 :+I
352 j L3
353 L4
354 t #4
355 v L7 #I3
356 v L6 #I2
357 v L5 #I1
358 f L8
359 L3
360 }
361
362 The beginning of the jump table is indicated by the the letter 't',
363 followed by the number of cases (including default case) of the
364 switch.
365
366 ## Resumen ##
367
368 * C -- signed 8-Bit integer
369 * I -- signed 16-Bit integer
370 * W -- signed 32-Bit integer
371 * O -- signed 64-Bit integer
372 * M -- unsigned 8-Bit integer
373 * N -- unsigned 16-Bit integer
374 * Z -- unsigned 32-Bit integer
375 * Q -- unsigned 64-Bit integer
376 * 0 -- void
377 * P -- pointer
378 * F -- function
379 * V -- vector
380 * U -- union
381 * S -- struct
382 * B -- bool
383 * J -- float
384 * D -- double
385 * H -- long double
386 * A -- automatic
387 * R -- register
388 * G -- public (global variable declared in the module)
389 * X -- extern (global variable declared in another module)
390 * Y -- private (variable in file-scope)
391 * T -- local (static variable in function-scope)
392 * M -- member (struct/union member)
393 * L -- label
394 * { -- beginning of function body
395 * } -- end of function body
396 * \\ -- end of function parameters
397 * \+ -- addition
398 * \- -- substraction
399 * \* -- multiplication
400 * % -- modulo
401 * / -- division
402 * l -- left shift
403 * r -- right shift
404 * < -- less than
405 * > -- greather than
406 * ] -- greather or equal than
407 * [ -- less or equal than
408 * = -- equal than
409 * ! -- different than
410 * & -- bitwise and
411 * | -- bitwise or
412 * ^ -- bitwise xor
413 * ~ -- bitwise complement
414 * : -- asignation
415 * _ -- unary negation
416 * c -- function call
417 * p -- parameter
418 * . -- field
419 * , -- comma operator
420 * ? -- ternary operator
421 * ' -- take address
422 * a -- logical shortcut and
423 * o -- logical shortcut or
424 * @ -- content of pointer
425 * :/ -- divide and assign
426 * :% -- modulo and assign
427 * :+ -- addition and assign
428 * :- -- substraction and assign
429 * :l -- left shift and assign
430 * :r -- right shift and assign
431 * :& -- bitwise and and assign
432 * :^ -- bitwise xor and assign
433 * :| -- bitwise or and assign
434 * ;+ -- post increment
435 * ;- -- post decrement
436 * j -- jump
437 * y -- return
438 * b -- begin of loop
439 * d -- end of loop
440 * s -- switch statement
441 * t -- switch table
442 * v -- case entry in switch table
443 * f -- default entry in switch table