OEP

171

Title

Partial Barrier Synchronisation

Summary

Allow a subset of enrolled processes to complete a barrier.

Owner

Peter Welch <p.h.welch@kent.ac.uk>, Fred Barnes <f.r.m.barnes@kent.ac.uk>

Status

Proposed

Date-Proposed

2008-03-09

Keywords

EXTENDED partial BARRIER SYNC

Motivation

A normal occam-pi BARRIER requires all its enrolled processes to commit (SYNC) to it before it completes. Until then, synchronising processes are blocked. Once the barrier completes, all synchronising processes are released.

Sometimes we want only partial completeness of the barrier. Suppose there are n (> 0) processes enrolled on a barrier. Choose a number p, where 0 < p < n. A partial barrier with threshold p completes if, and only if, p of its enrolled processes commit (SYNC) to it. Until then, synchronising processes are blocked. Once the partial barrier completes, the p synchronising processes that completed it are released – this is atomic with the pth enrollment (i.e. if p + 1 processes committed to synchronise around the same time, the barrier would complete, p processes would be released and the remaining one would be left blocked).

The classic system illustrating this is Santa Claus and his Reindeer and Elves. Santa is awoken when any 3 of his 10 elves get together to consult with him. The elves enroll on a partial barrier with threshold 3.

Language Binding

We need to work with the dynamic nature of occam-pi BARRIERs, where the number of processes enrolled can vary over time – even without new processes appearing or old processes disappearing.

Let's try to do this without declaring special PARTIAL qualifiers on barriers. So, all barriers are potentially partial – by default, they are full.

Consider:

PROC SET.PARTIAL.THRESHOLD (BARRIER b, VAL INT p)

This is a compiler intrinsic that sets a partial threshold on a barrier. Actually, it sets a limit on that threshold. If the number of processes enrolled on the barrier grows above p, the barrier operates as a partial barrier with threshold p – as defined in the Motivation section above. Otherwise, the barrier is a normal full barrier, completing whenever its enrolled processes all commit to synchronise.

Normally, the process declaring (or constructing, in the case of MOBILEs) a barrier would invoke SET.PARTIAL.THRESHOLD once and that would be that.

However, any process enrolled on the barrier may decide to change that theshold at any time, so SET.PARTIAL.THRESHOLD needs to be safe for parallel invocation. [I can't see why they might want to do that though ...]

Implementation should be easy – just one more field for this partial threshold and fairly trivial management?

A Problem

Partial barriers, so far, are not particularly useful!

Consider the Santa Claus system mentioned earlier. Suppose all 10 elves decide they want to consult with Santa at the same time. They are enrolled on a partial barrier with threshold 3. So, 3 of them complete the partial barrier and are waved through ... closely followed by another 3 ... and closely followed by another 3. Each elf, having synchronised on that partial barrier, knows it is one of a group of 3 elves ready to consult with Santa – and off they go. But, in fact, there are 9 elves all thinking the same thing! They still need marshalling into groups-of-3 and those groups need stacking up, since only one group is supposed to approach Santa at a time. The partial barrier may as well not have been there – we would then have 10 elves that need marshalling, rather than 9, but the problem would be no worse.

An elegant solution uses extended partial barrier synchronisation.

Extended Partial Barrier Synchronisation

OEP/167 proposed extended synchronisation for full barriers. This proposal inherits that idea. No new syntax or compiler intrinsics are needed.

Two language bindings were proposed – mechanisms A and B. The discussion for that proposal referred to this one for a second reason in favour of Mechanism B.

Mechanism B proposed that the receiving end of a tail channel, b?, exists for any barrier, b. When enough processes synchronise to complete the barrier, a signal is sent down its tail and must be accepted by a handling process before the synchronising processes are released. The handling process may block that release by refusing this signal (e.g. because it was ALTing on that channel and is currently servicing some other event). If the handling process accepts the signal with an extended input rendezvous, that action must also complete before the synchronising processes are released. Finally, if there is no tail channel handling process (i.e. no reference to b?), then the tail does not exist and the barrier operates as normal (with no signal sent to a tail).

For full barrier synchronisation, Mechanism B can be obtained as a special pattern of Mechanism A. Mechanism A proposed an extended synchronisation block, similar to the rendezvous block for extended input. So, signalling a tail channel can be simply delegated to an extra process plugged into the barrier:

PROC barrier.ping (BARRIER b, CHAN BOOL ping!)
  WHILE TRUE
    EXTENDED SYNC b
      ping ! TRUE
:

However, this does not work for partial barriers!

The problem now, of course, is that a partial barrier can complete without that barrier.ping being involved.

What we need is a mechanism that insists that a specified process must be one of the subset of processes completing the partial barrier. We need a way to single out a favoured process ... and that's what the Mechanism B barrier-with-tail gives us! The process handling the tail has to be there and accept the signal for the partial barrier to complete.

Suppose we define and implement extended partial barriers using the tail mechanism.

Reconsider, now, the Santa Claus system. Make the elves synchronise on an extended partial barrier with threshold 3. For symmetry, make the reindeer synchronise on an extended partial barrier with threshold 9, otherwise known as an extended full barrier (there are 9 reindeer). Connect the two tails to Santa, who PRI ALTs on them, giving priority (as defined for the problem) to the tail from the reindeer barrier. Santa does not perform extended inputs on either of the tails.

Now, suppose that all 10 elves decide they want to consult with Santa at the same time. They commit to SYNC on their partial barrier. When the threshold (3) is reached, the signal on the barrier's tail is offered. But if Santa is busy delivering toys with the reindeer, all elves remain (correctly) blocked.

When/if Santa gets around to accepting that signal, 3 elves complete their barrier and are released to consult with Santa. Santa knows this and greets them.

The other 7 elves remain blocked at their barrier. Because 7 is more than 3, the signal on the barrier's tail is again being offered – but Santa is not looking.

When Santa finishes consulting with that first group of elves, they return to work and Santa loops round to his PRI ALT. So long as the reindeer are not back from holiday (and SYNCing on their barrier so that that tail is signalling), Santa will accept the elves' barrier signal and the next group of 3 elves are shown into his study.

Etc.

Marshalling the elves into groups of 3 and scheduling them for separate consultations with Santa is now completely (and simply) managed by the extended partial barrier. But it uses the barrier-with-tail Mechanism B from OEP/167. Can this be done with Mechanism A?

Implementation of Mechanism B

This should not be hard. We just need to add a (tail) channel synchronisation word to the barrier data structure. When enough processes SYNC (and are nicely queued), signal down the tail channel (leave a scheduling address back to this point in the kernel). This may block – no matter.

If this is a partial barrier, more SYNCing processes may turn up – just keep queuing them (only the pth arriving process in the queue triggers the signal on the tail channel, where p is the triggering threshold). When the tail channel signal is accepted, release the first p processes in the queue. If p or more are still left, signal the tail channel again.

If this is a full barrier, there is no more to do until the tail signal is accepted. When that happens, release the whole queue of barrier SYNCing processes.

That's about it!

OEP/171 (last edited 2008-03-10 14:04:45 by phw)