Revision Notes

Identification

Often (5 different papers) you will be asked to produce the symbol table, attribute records, and links between these, corresponding to a C program that has a lot of nested scope.

Remember; this is not the program running, it is the program being parsed by the compiler in the identification phase.

Best way to do this is probably start with the attribute records in the middle. This is a set of boxes labelled with the variable names. You start from the top of the program code and work your way down, drawing arrows from the code to the boxes. You will probably have multiple boxes with the same name, that’s fine (they’re all in different scopes).

Now start to draw the symbol table. You could do this by working back from wherever you are drawing the tables for (marked by /*here*/ in one exam paper). Variables inside a function are in a scope, variables in arguments are in a scope, functions in a global program are in a scope. Don’t forget to draw lines from the attribute records to the symbols in the symbol table. You’ll probably have plenty of lines that go to attribute records but not to the symbol table. That’s fine, that’s because the attribute records are done all at once for the program, and the symbol table is being produced as it goes through parsing (e.g. the compiler may have passed over a function already).

S-attributed and L-attributed grammars

An S-attributed grammar is also an L-attributed grammar. An attribute grammar is S-attributed if there are no inherited attributes. That is, children do not inherit anything from their parents. Inherited attributes are a problem for bottom-up parsing because the parents aren’t made yet. The example of the ‘simple arithmetic’ grammar we were given in the practicals was S-attributed because the exp.val value was synthesised (made up from child attributes).

L-attributed grammars are attribute grammars that can be evaluated in one go, from left to right. I’m guessing children can have inherited stuff, you can only inherit from the first parent though I guess? This is a bit confusing.

C- to TM

You’re probably going to be asked to convert a function to TM code. That means your code is going to need to start with something like:

ST 0,	-1(5);	store return address

Which stores the value in register 0 in the memory location determined by taking the value in 5 and subtracting 1. This is because 5 is where the FP is. and the FP for returning is at an offset of -1 of that. You just have to remember that, apparently. For some reason register 0 has the return address as well. This is probably something to do with how functions are called in TM or something. Or maybe there’s something weird about register 0, I don’t know.

Parameters are then at -2 etc. then local variables from then on.

So say you have to translate something like:

int f (int y) {
	if (y) return 1;
	return 0;
}

Simple right?

1: ST	0,	-1(5);	store return address

This is as above. Now we need to get hold of y, which as above will be at 5-2. We’ll reuse register 0:

2: LD	0,	-2(5);	load value of "y"

Now we need to check if it’s not 0. This is a TM opcode:

3: JNE	0,	2(7);	check if not 0. Jump 2 lines if it isn't

To return a value, load it into register 0, then adjust the PC based on the value we stored at the start:

4: LDC	0,	0(5);	Load 0 (the '5' is ignored)
5: LD 	7,	-1(5);	Load return address into PC
6: LDC	0,	1(5);	"y" wasn't 0, load 1...
7: LD	7,	-1(5);	and return.

In the answers, there is also an extra line that is a duplicate of line 7. Apparently this would have been produced by the compiler, but we weren’t expected to remember that.

Type equivalence

An untagged type does not equal a tagged type even if the structure is the same. This is because an untagged type is a unique anonymous type created at that point in the program.

In C, a * means pointer. ** means a pointer to a pointer (not the same as just a pointer).

Writing attribute grammars

Remember (when dealing with attribute grammars) that test expressions in if statements etc. need to return integers (in C- code anyway).

Also, don’t try too hard to implement functions, just pretend they exist already and add it as an explanation. For example if you need something that returns the maximum of two values, just use max(a, b).

Abstract syntax vs. Concrete syntax

Abstract syntax (and their CFG) can be ambiguous as it does not need to be parsed.
Abstract syntax can be devoid of implementation specific details.
Concrete syntax describes every character of the program (punctuation etc.).
Abstract syntax may describe invalid programs.

Sequential declaration vs. Recursive declaration structure

Sequential declaration can be done in a single depth-first program traversal.
Recursive declaration has to insert all declarations of a scope in the symbol table, then traverse the whole scope to lookup use occurrences.

I don’t really understand this. Googling doesn’t really help either.

Stack frames

In the 08-09 paper, the stack frames contained:

Dynamic Link: points to the previous frame on the stack.
Static Link: points to the most recent logical enclosing procedure frame.
Local variables: Which are accessed within the frame by adding an offset to the Frame Pointer.

The Frame Pointer is a pointer pointing to the middle of the current frame on the stack – it’s not part of the stack frames themselves.

Variables outside of a frame (e.g. variables that come from enclosing procedures) are accessed by following the static link.

When asked to draw all the stack frames, remember the variable values are the values at the specific point in time in the program (given in the question). So perhaps best to leave them blank until you’ve drawn all the stack frames, then work out what they would contain at that point in time.

The normal order for a stack frame (for Ada?) is:

Procedure argument variables
Dynamic link
Static link (unless first procedure)
Variables declared in this procedure
Return address

Mark Phase with Pointer Reversal

Mark Phase garbage collection is pretty simple. Go around marking records when you see them, and don’t traverse records if they are already marked (which solves any cyclic problems).

However it has an issue; it uses a DFS which is recursive and so requires a stack proportional to the longest path in the graph, which might be pretty big.

The solution is to use the current field x.f_i to point back to the record t from which x was reached. Then you just need a short integer done[x] in every record to indicate the current field.

That way, you’re kind of keeping a trail that you can traverse back using. When you are going back up, you put the pointer back to how it was first, then set the next pointer in the record to the previous record…If that makes sense. It helps to look at an invariant for a while.

Cost of Garbage Collection

The cost C of a GC algorithm is calculated ‘per amount of garbage reclaimed’. So chances are you will need to do:

C = (cost of all the actions of the algorithm) / (Heap - Reachable)

l-values and r-values

l-values and r-values refer to the left hand side and ride hand side of assignments. Basically the same ‘code’ can have different meanings depending on which side of the assignment it is. For example:

person.age := person.age + 1

Here, person.age on the LHS is referring to the memory address. person.age on the RHS is referring to the memory content at the address.

Abstract syntax trees

Despite the fact that Abstract syntax trees aren’t meant to show the nuances of the syntax, in all the exam papers it seems necessary to remember to include symbols like assignment (:=). Can always make a comment about it. Parenthesis don’t seem to always matter though.

Dealing with Overloading

Local resolution (C): Types of the arguments determines which function to use.
Global resolution (Ada): Full usage context taken into consideration e.g. mixing integers and floats (2 + 3) + 4.5. Must be a single resolution.
Runtime resolution (Haskell): Functions are passed dictionaries which have all the possible variants of the function; runtime then chooses which one to use.

Types of intermediate language

Not as hard to remember as you might think:

Tree-based languages, which are simplified but also like source.
Stack-based languages, which are difficult to re-order.
Three-address languages, which use temporary variables all the time and instructions are RISC-like. Maps well to register-based machines.

How do you compute ‘Liveness’?

out(n) = set of in(s) where s is the 'successive' nodes
         of n (note that because you work backwards, this
             is the 'previous' ones.

             The final node has no `out` bit.


    in(n)  = use(n) + (out(n) - def(n))

         (where + is union and - is intersection)

Repeat this in a table until it is stable.

Jalada