machine [2] is a modification of the
G-machine in order to allow parallel graph reduction of lazy ML on a
shared memory multiprocessor (a 15 processor Sequent Symmetry
machine). Both sequential and parallel versions are possible.
It uses stack frames (one small stack for each single
evaluation) in the heap.
Only stack frames need locking as there is less synchronisation
during evaluation of a node.
Heap allocation simply increments the heap pointer. Each processor
allocates a chunk of the global heap space when it needs more heap
space. When a processor cannot allocate more heap space, garbage
collection is performed.
Garbage collection requires all the processors to be
synchronised first. Sequential garbage collection is used (possible
system deterioration due to the garbage collector being called too often).
A frame node is used to represent a suspended function
application; it holds the arguments of the application, plus a pointer
to the code for the function, and some space for temporary variables
for intermediate results during reduction. It also has a dynamic link
field which holds a pointer to the frame node from which evaluation
was called. Recursive evaluation links together individual frame nodes
to form a stack frame of nodes. Several such frames correspond to the
evaluation by parallel processes. Together these stack frames form a
cactus stack.
Spark model is used for parallelism, with one point of
evaluation for each evaluating process. Sparks are advisory only. If
for some reason, evaluation has not started the process needing the
value does the evaluation itself.
The machine behaves like a stack machine. As in
G-machine, compilation rules follow compilation schemes. Instruction
transition rules deal with the instructions emitted by the compilation
schemes. The main difference between the G-machine and the machine is the use of individual stack frames to evaluate the
graph. In the parallel version of the machine there are many
reduction points where reductions take place concurrently. There is
only one reduction point in the sequential case. Also, in the parallel
case there is a spark pool of nodes waiting to be evaluated.
One point of reduction for each evaluating process. Time slicing
is used to guarantee that all evaluation points make progress.
Flag bit on a tag in the node for exclusive access using test
and set and busy wait until lock acquired.
A process suspension mechanism so that a process can be waiting
but the processor can continue doing something else is implemented.
Real speed ups (5 to 11) on standard G-machine evaluations on
LML ares reported.