Is there any paper describing how this extends to objects containing N pointers without blowing up the number of bits needed?
Like say we are marking a vector containing pointers to objects. We need a state machine which tells us, on each visit to that object which tells us: "we have marked index [i]; now mark index [i+1]".
A list of 256 nodes doesn't have this problem: it has the marking bits in each node!
Also, how do we design the API for extending the heap with new kinds of objects. With the recursive traversal, you just provide a handler which traverses your objects and recurses down the pointers.
Perhaps we need some complicated protocol now whereby each object has methods like "init_traversal", "get_next_pointer", and "finish_traversal". In init_traversal, it receives a backpointer that it must store, and it returns a pointer that it would like the caller to retain (the one that is bumped to make space for the stashed one). In "finish_traversal", it gets back the retained pointer, in exchange for returning the stashed backpointer. Something like that.
Actually a protocol like this could perhaps clarify the algorithm.
I find the easiest way to think about this trick is to consider the CPS transformation of a depth-first traversal of a tree (in the case of a general graph, you use a DFS spanning tree). At each recursive call, you need to be able to produce and store the continuation upon return. Upon returning from a recursive call, you need to rematerialize the continuation.
The object pointer for the most recent continuation is stored in a global variable, and others are stored in a stack constructed by using pointers in objects that can now be inferred from the stacking nature of the recursion.
Part of the information for the code pointer for the most recent continuation is stored in the object pointer, i.e. from the object header you narrow the possibilities to process it. For an object containing N fields, you need log_2 N bits to store the field you were last processing. Since each object can appear in at most one stack frame of the recursive traversal (modulo where you put the marked check to bail out in the graph case), you need to put these log_2 N bits somewhere in the object.
You have a few options for where to put the bits:
- Find a few bits lying around where you can store them. This works pretty well for objects with few fields.
- Add an extra field with log_2 N bits to store them. This is the most straightforward solution for a larger object with an arbitrary number of fields.
- If you have an easy way of locating object headers from internal pointers, you can use internal pointers as your back pointers.
- Steal a few bits from every field that has a pointer value stored in it. This works because you actually only need log_2 M bits where M is the number of fields containing a pointer. The non-pointer fields can all be skipped.
This scheme can be generalized to implement arbitrary structural recursion on polynomial datatypes:
Like say we are marking a vector containing pointers to objects. We need a state machine which tells us, on each visit to that object which tells us: "we have marked index [i]; now mark index [i+1]".
A list of 256 nodes doesn't have this problem: it has the marking bits in each node!
Also, how do we design the API for extending the heap with new kinds of objects. With the recursive traversal, you just provide a handler which traverses your objects and recurses down the pointers.
Perhaps we need some complicated protocol now whereby each object has methods like "init_traversal", "get_next_pointer", and "finish_traversal". In init_traversal, it receives a backpointer that it must store, and it returns a pointer that it would like the caller to retain (the one that is bumped to make space for the stashed one). In "finish_traversal", it gets back the retained pointer, in exchange for returning the stashed backpointer. Something like that.
Actually a protocol like this could perhaps clarify the algorithm.