You’re going to have to write a lot of compilers if you go from language to assembly. So why not put an intermediate language in the middle. Then you can reuse optimisation phases and mainly it saves a huge amount of work.
Requirements of IL
Independent of source and target language.
Each construct has a clear and simple meaning.
Easy to translate from source and into targets.
- few control structures (e.g. conditional jump and procedure call)
- no structured data types (only basic and addresses)
So basically, make your own assembly language?
You might have several ILs. In gcc you have generic to gimple (SSA) to RTL to target asm.
Classes of intermediate language
- Simple functional languages (e.g.
coreof ghc, first step from Haskell). Has 6 different constructs.
- Tree based – part of Ada compiler.
- Stack based – Java byte code. Jorvik from CAR.
- Three address instruction based – GNU RTL. Apparently we’ll be doing stuff with one of these. Typical considering I wasn’t paying attention to the explanation of one.
Comparison of IL
- Stack based and three-address both result in linear programs.
- Stack based results in few addresses, so the code is compact. JVM for example.
- Stack based code does not need later register allocation.
- Three address code utilises fast registers whereas stack-based does not.
- Three address instructions are easier to rearrange and thus optimise.
Note that it’s important to differentiate between the language and the code. The intermediate language is target language independent, but the generated intermediate code is target machine specific because it contains addresses and offsets.