Data Representation

These are lecture notes from my Computer Science course.

First, a summary of Run Time Organisation

Run time organisation deals with the general issues of mapping high-level programming language constructs onto low-level features of the target machine. We are going to look at:

  • Data representation.
  • Memory organisation.
  • Procedure call protocol.

Now, Data representation

We might want constant-size representation, that is; given a type, we should know how much space to use.

How would you represent that?

  • Direct: binary representation, very efficient.
  • Indirect: pointer to actual representation, constant size even if type is unknown (subtype, polymorphism), copying of pointer is efficient.

The slides then have lots of examples of type representations (e.g. integer is 4 bytes, enumeration is small integers).

You may have alignment restrictions within the system. For example you can’t have a 3 byte integer because the system is aligned to 2/4/8 bits or something like that. So you often have to fill unused areas between consecutive values. You might be able to reduce this filling by reordering the fields but this is sometimes forbidden by the language specification (C – fields have to be in order in memory in the order they were written in the program).

Unions all take up the same amount (largest) of space. Watch out for data corruption as you never know what is in it.

Arrays are mappings from a bounded range of indices to elements of one type. Elements are stored consecutively in memory. Fixed size elements makes it easier (though I guess is not hugely necessary). Indexing code (code that finds array element n) should check array bounds to be safe. Multi-dimensional arrays are just arrays of arrays. Array size may not be known at compile time.

Pointers are represented as an address of the value referred to by the pointer. There is one machine-dependent size for all pointers (size of the largest memory location I guess?). Pointers are needed for recursive types (lists, trees) or just indirect representation. The value referred to cannot be in a register, it must be in RAM.

Functions are often just pointers to code entry points. There are more complex representations however and we will deal with them later.

Objects are a bunch of pointers and values as you might imagine. Object inheritance involves copying all the previous stuff and tacking on extra bits to the ends (e.g. new instance variables or new methods).

This entry was posted in cgo, lecture. Bookmark the permalink.