Thread Creation and Scheduling

Next: GUM Up: GRIP Previous: GRIP

Thread Creation and Scheduling

An expression to be evaluated is represented by a graph of closures , each of which is held in a heap. Each closure consists of a pointer to its code , together with zero or more free-variable fields. Some closures are in head normal form , in which case their code is usually just a return instruction. Other closures represent unevaluated expressions, in which case their code will perform the evaluation. A closure is evaluated by jumping to the code it points to, leaving a pointer to the closure in a register so that the code can access the free variables; this is called entering the closure. When evaluation is complete, the closure is updated with (an indirection to) a closure representing its head normal form.
A thread is a sequential computation whose purpose is to reduce a particular sub-graph to normal form. At any moment, there may be many threads available for execution in a parallel graph reduction machine; this collection of threads is called the thread pool . A processor in search of work fetches a thread from the thread pool and executes it. A particular physical processor may execute a single thread at a time, or may split its time between a number of threads. During execution, a thread may encounter a closure whose value will be required in the future. In this case it has the option of placing a pointer to the closure in the thread pool, where it is available for execution by other processes - this is referred to as sparking a child thread.
Inter-thread communication and synchronisation occurs only when a parent thread at some stage requires the result of a sparked closure which has started execution. In that case, the parent becomes blocked , until the sparked closure completes evaluation. On receiving the evaluated result, the blocked parent thread resumes processing. An evaluated expression (in Normal Form) may be used by any thread simultaneously without any contention. This is referred to as evaluate-and-die model for blocking/resumption. A parent gets blocked only when the sparked thread has already started but has not completed yet. A sparked thread is ignored, and the parent itself evaluates the graph if it reaches the point when it needs the result and the sparked thread has not yet been allocated to a processor for execution. Such an orphaned sparked thread is removed at the next garbage collection.
The global address space holds the graph to be evaluated as variable sized graph objects - pools of executable threads. Local address space on each Processing Element (PE) is used for evaluating one or more threads acquired from the pool. Heap objects in the global address space are grouped into evaluated and unevaluated ones. An unevaluated object has a bit which is set when fetched by a PE. This locks the object and makes it unavailable by any other PE. When evaluation is completed the object is overwritten with its result and becomes an evaluated object. All evaluated objects are available for simultaneous access by all the PEs without contention. Global heap is accessible to each PE. Local heap is held in local memory of a PE and is private to it. Caches of closures are held there, while the code for currently evaluated closure is held in local memory. Local heap can be garbage collected independently and in parallel with graph reductions by other PEs. Global garbage collection will affect all the PEs, but is rarer. Closures are copied from the global heap to the local cache held by a PE when needed. An evaluated closure from a PE is `flushed' out to the global heap if it is a copy of a closure in the latter. The only other occasion when this `flushing' out from a PE to Global heap occurs is when the PE is overloaded or when the system load is low and othe PEs are idle. This ensures that the performance cost due to `flushing' is minimised.
GRIP results are mixed, and the approach is prototypical.

Next: GUM Up: GRIP Previous: GRIP

Ananda Amatya
2/16/1999