Garbage Collection 2

These are lecture notes from my Computer Science course.

Cost of Garbage Collection

I think basically if you have a very small reachable heap copying is good (Cheney’s Algorithm). But otherwise Mark-and-Sweep is pretty good.

Mark Phase with Pointer Reversal

Note: You’d do this because doing a DFS down a heap is a way of doing garbage collection; you mark everything on the way down following pointers, then you can remove anything that isn’t marked.

The problem with using a DFS algorithm for garbage collection is that it requires a stack proportional to the longest path in the graph. Therefore we might need a huge stack.

Solution to this; depth first search using pointer reversal. The pseudocode for this is in the lecture slides (number 98). It’s missing a key though, which is:

t = previous node
x = current node
y = next node
done[n] = num of fields visited

It works by advancing down into the nested pointers, and then retreating back up again. It is not recursive, it uses a while loop instead.

He drew a little graph animation thing too, which is available along with the annotated pseudocode.

Reference Counting

Reference Counting is a bit different to using DFS. Every record has a reference counter and it counts the number of ‘in-arrows’, the number of pointers into a cell. Before every assignment x <- y decrement counter of x and increment counter of y. When the counter is 0, add record to ‘free list’ and decrement counters of all records x points to.

However this has some problems:

  • Cycles of garbage cannot be reclaimed.
  • Incrementing and decrementing can be expensive.

Systems that use reference counting often have to have a second garbage collecting method that runs infrequently to find and get rid of the cycles mentioned above.

Here’s another chart thing he used.

Copying Collection

The problem with garbage collection also is that you might end up with fragmentation. You can solve that by compacting the data by copying it somewhere else.

Copying Collection divides up the heap space into from-space and to-space and you only allocate into one space at a time; once it’s full, you copy all the live reachable data into the other space. In doing so, you compact the data because you only copy the live reachable data.

Big issue with this is that you have to change all the references in the program which can take ages.

However it means that allocating memory is really cheap as you don’t have to do anything – just allocate memory on the end.

Cheney’s Breadth-First Copying Algorithm

This is one copying algorithm. You could use DFS again but again you have the same problem of eating up stack loads.

Slide 101 has pseudocode for this. It runs a bit like the pointer reversal method; you record the progress in something other than a stack. In this case you record it using a scan pointer, which is the beginning of the to-space. Here’s another animation set thing.

Locality of Reference

If you do breadth-first copying you end up with poor locality of reference. That is that records pointing to other records are far away from each other. This can slow things down as you have to keep getting different memory pages.

What about depth-first copying? It gives you better locality of reference but it’s more complicated; you need to use something like pointer reversal.

Solution: use a hybrid of both.

Generational Collection

  • Newly allocated records are likely to die soon.
  • Old records are likely to live long.

Therefore:

  • Divide heap into generations (G0, G1…Gn).
  • Collection only some generations: G0 + … + Gi.
  • Promote record from Gi to Gi+1 when it survived 2-3 collections of Gi.
  • Each collection expontentially larger than previous.

There is a problem if you garbage collect G0 and something from G1 is pointing to G0 and you don’t realise. There’s a solution, but I missed it. See the slides.

Incremental Collection

One problem is that you have to interrupt program execution for long periods with garbage collection. This is a big problem for interactive programs and real-time programs (really you’d just never garbage collect for real-time programs. It’ll ruin your time constraints randomly).

The solution is that you can interleave garbage collection and program execution. However you may miss reachable records as the program is modifying pointers ‘under your feet’ so to speak; e.g. as you’re marking things as reachable, something may change so you may end up losing something that is actually reachable. Solution?

Tricolour Marking

You can classify records as one of 3:

  • White; not (yet?) visited.
  • Grey; visited but children not yet examined.
  • Black; visited and all children visited.

The invariant is that no black records point to white records.

This means that additional work is required when the program execution modifies the reachability graph:

  • A write barrier algorithm; when program execution stores white pointer a in black object, it colours a grey.
  • A read barrier algorithm; when program execution reads a white pointer, it colours it grey.

There’s an example animation/chart/graph/thing though it doesn’t look like it progresses (it’s just one slide but in the lecture it was many; an animation). Still makes sense though I think.

Compiler Support for Garbage Collection

Most garbage collectors don’t have cooperation with the language or compiler, they’re often retrofitted onto a language. Therefore they have to ‘guess’ what is a pointer. Therefore it’s a ‘conservative’ collectors because you might end up retaining stuff that is just junk. That’s OK; it doesn’t do any harm to sit there a little longer.

This entry was posted in cgo, lecture. Bookmark the permalink.