Recently, a user was asking on IRC why CHICKEN doesn't give proper line numbers for error call traces in the interpreter. This is indeed rather odd, because the compiler gives numbered traces just fine.
After a brief discussion, we figured that this was probably because line numbers are stored in a "database" (hash table) which maps the source expression to a line number. Because you can keep calling (eval) with fresh input, the number of expressions evaluated is potentially infinite.
This would lead to unbounded growth of the line number database, eventually eating up all available memory.
We'd like to fix this. A great solution for this problem would be weak references. A weak reference is a reference to a value that does not cause the garbage collector to hold on to that value. If other things still refer to the value, it is preserved and the weak reference is maintained. If only weak references refer to a value, it may be collected.
The line number database could then consist of such weak references to source expressions. If an expression is no longer held onto, it can be collected and the line number eventually removed. This would turn the database from a regular hash table into a weak hash table.
This requires hooking into the garbage collector so that it knows about these references and can drop them. Since our collector is a copying collector, addresses of objects change over time, and a weak reference needs to be updated if something is still hanging onto it, without itself causing the weakly held object to stick around.
A quick recap of garbage collection
To explain how we changed our garbage collector, I will start with a quick high-level recap of how the garbage collection works. I'll explain just enough to cover what is needed to understand weak references. For a more in-depth look at CHICKEN's garbage collection, see my post on the GC.
First, we have to realise that CHICKEN stores all its values in a machine word. It distinguishes between "immediate" values and non-immediate or "block" values. Immediate values can be either fixnums (bit 0 is set) or other "small" values (bit 1 is set). Non-immediates are recognised because the lower two bits are zero, which is very convenient, as a pointer to a word-aligned address happens to have the lower two bits cleared as well!
In other words, non-immediate values are simply C pointers, whereas the immediate values are encoded long integers. So, non-immediates are allocated in memory and represented by a pointer to them. At the pointer's address we find a header which encodes what exactly is stored there, and (optionally) the data, where we store the contents of compound objects.
typedef struct { C_word header; C_word data[]; /* Variable-length array: header determines length */ } C_SCHEME_BLOCK;
In typical Scheme objects, the data consists of slots. For example, the car and the cdr of a pair are slots, the elements of a vector are slots, etc. Each slot contains a Scheme value. In a graphical representation, a block object looks like this:
The data representation will be important to understand how we implement weak pairs, and to follow the garbage collection process.
If you want more details, see my post about data representation in CHICKEN.
A tale of two spaces
CHICKEN divides the heap up in two halves. While the program is running, only one half is in use. When this half fills up, we start the garbage collection process. We trace all the data that's still considered "live" (aka GC "roots") and copy it over to the other half.
Let's look at an example:
(cons 5 '()) ; unreferenced, may be collected (define x (cons (vector 42) '())) ; sticks around
The half that was full is called the fromspace and the half that starts out empty is called the tospace. Initially, the tospace is empty, and the data is only in the fromspace:
Then, we move the objects, one at a time, starting at the roots. So in the example, the pair that we named "x" gets moved first to tospace, while its slots still point into fromspace:
The changes are in yellow. After tracing the roots, we trace all the contents of the objects in fromspace and copy over the pointed-to objects:
Finally, we're done. Any remaining objects in fromspace are garbage and can be ignored, effectively "clearing" fromspace, and we flip the two spaces so that fromspace becomes tospace and vice versa:
Onward! I mean, forward!
This was a very simple case. A more complex case is when there are several objects with mutual references. We must somehow keep track of which objects got moved where, so that we don't copy the same object more than once.
Another example:
(define shr (vector 42)) (define x (cons shr 1) (define y (cons shr 2))
This would look like the following when visualized:
As I've explained, roots are copied first, so let's say shr gets copied first:
As you can see, this introduces something new. There's a forwarding pointer stored at the address where we originally found the header of the pair known as shr. This is done so that we don't have to traverse all the objects pointing to shr (which could even be cyclic!). Instead, whenever we find an object that holds an address where there now is a forwarding pointer, we derefence the pointer and change the object to point to the target address.
So, if x is copied next, we get the following picture:
We actually *always* leave behind a forwarding pointer when an object is copied, because the GC does not "know" whether anything references the object. I skipped that in the initial examples to keep it simple. And finally, y can be copied:
Now you can see, the forwarding pointers are still there in fromspace, but nothing references them anymore. The GC is now done. Of course, fromspace and tospace will be flipped (but I'm not showing that here).
How weak pairs work in the GC
Let's make a few changes to our example:
(define zero (weak-cons (vector 0) 0)) ;; new - car may be collected (weak-cons (vector 9) 9) ;; new - may be collected (define shr (vector 42)) (define x (cons shr 1) (define y (weak-cons shr 2)) ;; changed to be weak (set! shr #f) ;; new - dropping root for shr (define z (weak-cons (vector #f) 3)) ;; new - car may be collected
Now, y is changed to be a weak pair holding shr in its car, while x is still a normal pair with the same object in its car. The shr variable is also cleared, so that only x holds onto shr's old value strongly. We've added a few more weak pairs while we're at it.
This weak-cons procedure does not exist in CHICKEN 5.3.0; this is what we'll be adding. It creates a pair whose car field is a weak reference. The cdr field is a regular reference. The reason for having the cdr be a regular reference is that this allows us to build up lists of items which may be collected. The "spine" of the list will remain intact, but the list entries will be cleared (see below).
Let's take a look at our new initial situation:
The GC will start with the live objects, as usual. Let's start at the top, copying zero:
Since zero is a weak pair, we won't traverse the car slot, keeping the vector it points to in fromspace. This is intentional, as the GC treats weak pairs differently; only the cdr slot gets scanned, while the car gets skipped and left as a "dangling" pointer into fromspace.
We'll fix that later, as we don't want to keep this situation after the garbage collection process is done.
Normally we'd pick up shr next, but that is not a root anymore. Let's continue with x instead:
Then, as we scan over the contents of x, we encounter what used to be shr and move it, updating the pointer in x:
Next, we move y:
You'll notice that because it's a weak pair, its car slot does not get updated to the new location of what used to be shr, even though that vector has already been copied. As mentioned before, we'll fix this up later.
Finally, z gets moved:
I'm sure you've noticed that now we're left with a big spaghetti bowl of pointers, some of which are still pointing into fromspace. So let's investigate how we can fix this mess.
Fixing up the dangling weak pairs, naively
It just so happens that CHICKEN internally already has support for weak pairs. Those are pairs whose car field holds objects that may be collected. These are used solely in the symbol table, so that symbols aren't maintained indefinitely. This is important because it stops symbol table stuffing attacks. Those are a form of Denial of Service (DoS) by forcing the system to eat up all memory.
In this current implementation, the only weak pairs in existence are those in the symbol table's hash bucket chains. So, the solution there is very simple: traverse all the symbol table buckets and update weak pointers.
So we traverse the slots like during normal GC's mark operation, but with two exceptions:
- When we mark the slot, we don't cause the contents to be copied over into tospace, but only chase any forwarding pointers.
- When we encounter a pointer into fromspace, we replace the slot with a special "broken weak pair" marker.
That would yield this final picture:
As you can see, all the weak pairs have their car slots updated, even the ones that we could consider as garbage, like the two at the top. Those get the special value #!bwp, or "broken weak pointer" to indicate that the value originally stored there is no longer live.
Smarter fixing up of weak pairs
Traversing all the weak pairs is fine when we know every single one in the system. When only hash table buckets in the symbol table may be weak, that's the case. But now we want to expose weak pairs to the user, what do we do?
The simplest solution would be to add a table or list of all the weak pairs as they're constructed, or as the GC encounters them. This has a few problems:
- It is wasteful of memory, as the extra pointers in this table would be redundant.
- If done during garbage collection, we'll have to dynamically allocate during a GC, when memory is already tight (and it's slow).
- We'd end up traversing potentially loads and loads of dead weak pairs that would be themselves collected, as in the example above.
Alternatively, you could traverse the entire heap again after GC and mark the weak pairs. In fact, that's what Chez Scheme does. However, Chez has separate heaps for different types of objects (known as a BIBOP scheme). This has the advantage of traversing only the live weak pairs, instead of every single live object.
In CHICKEN, we'd be traversing every single object, as we store everything on a single heap. If the heap is large, this would be wasteful. But at least we wouldn't have to traverse dead weak pairs!
When looking into this problem, I decided to study the Schemes which have weak pointers. Those were Chez Scheme, Chibi Scheme and MIT Scheme. Of those, only MIT Scheme has a Cheney-style copying garbage collector with a single heap like CHICKEN does. The sources to MIT Scheme are well-documented, so I quickly found how they do it, and boy is it clever!
MIT Scheme's solution:
- Traverses only the still-live weak pairs,
- introduces only a single word of memory overhead,
- and is brilliantly simple once you get how it works.
Garbage collecting weak pairs, MIT style
Let's take a closer look at that end state of that last GC run. Remember that a forwarding pointer overwrites the original header of the object that used to live there. But the memory of the slots is still sitting there, allocated but unused!
The great idea in MIT Scheme is that we can "recycle" that unused memory and store a linked list of still-live weak pairs in that space. The extra word of overhead I mentioned before acts as the head of that list. Let's look at how that looks during the GC, by replaying that last run, but with MIT Scheme's algorithm for tracking weak pairs:
The initial state is the same, except we have the additional weak pair chain head. The GC again starts at the top, copying zero, same as before, but with one addition. When the forwarding pointer gets created, we notice that it's a weak pair. This causes the weak pair chain head to get modified so it points to the new forwarding pointer. The original value of the chain head (the empty list marker) gets stored where the old weak pair's car slot used to be.
Again, we continue with x, which is a regular pair, so nothing interesting happens to the space behind the forwarding pointer:
Then, as we scan over the contents of x, we again encounter what used to be shr and move it, updating the pointer in x. The forwarding pointer for shr is also not put in the chain because shr is not a weak pair:
Next, we move y. Because it is a weak pair, we again link up its forwarding pointer into the chain that we're building up. This means the pointer in the weak chain head (which pointed to zero's original address) gets put in the old car field of y, and the head itself now points to y's forwarding pointer:
Finally, z gets moved. Since it is also a weak pair, when making the forwarding pointer we also link it up into the chain:
Finally, we now only need to traverse the weak pairs by following the trail of pointers starting at the head of the live weak pair chain. For each of the forwarding pointers in the chain:
- Follow the forwarding pointer to its corresponding live weak pair,
- take that pair's car,
- check if the car refers to a forwarding pointer. If it does:
- update the car to the forwarding pointer's destination.
I'm not going to do this one by one as it's pretty obvious. I will show you the result of this walk:
You can see that we've only updated three out of four weak pairs - exactly the number of weak pairs that got copied during this GC and are still considered live. Because only weak pairs that exist in tospace have to be updated, we are doing the absolute minimum work necessary.
The nice thing about this is, that even if you produce lots of garbage weak pairs, they don't have any additional impact on GC performance. Only those weak pairs that survive garbage collection will be scanned and updated.
When implementing this improvement I noticed that our performance on benchmarks improved a little bit, even though they don't use weak pairs directly. This was all due to only scanning copied weak pairs and not having to scan the entire symbol table on every GC.
With the support for user-facing weak pairs in place, supporting line numbers in the interpreter was relatively easy. This means that debugging your code will be easier starting with the upcoming 5.4 release of CHICKEN!
Further reading
- At the Village CHICKENs event, I presented my findings about how to add weak references. Slides are here.
- Back in 2014, Ruby added symbol GC to stop symbol table stuffing attacks. It also improved performance because the GC had to do less work copying stale symbols. The flurry of interest in symbol table stuffing attacks in Ruby was also my inspiration for fixing and improving the symbol GC in CHICKEN.
- The manual to MIT Scheme has a section on weak pairs, and the manual to Chez Scheme has its own, slightly different API.
- MultiScheme: A Parallel Processing System Based on MIT Scheme by James S. Miller. This is the paper that introduces the weak-cons operation and explains in more detail how the MIT Scheme GC was modified, in the same way we modified CHICKEN's GC.
- Don't stop the BIBOP: Flexible and Efficient Storage Management for Dynamically Typed Languages by R. Kent Dybvig, David Eby, and Carl Bruggeman. This paper explains how Chez Scheme's allocation scheme works. We've toyed with the idea of implementing a BIBOP allocator and GC in CHICKEN as well, but it'd be a large undertaking without knowing for sure that it would improve performance.
- Data representations in PDP-10 MacLISP by Guy L. Steele describes the original BIBOP allocation scheme.