In the `something-for-the-weekend' posting (March 19th, 1999), I noted that we had managed to deadlock the JavaPP ALT algorithm: http://www.hensa.ac.uk/parallel/groups/wotug/java/discussion/4.html that was published some 2.5 years earlier and had survived rather well until then. It was left as a challenge to the reader to find the flaw (a race hazard) and correct it, saying I would publish my version at WoTUG-22. Unfortunately, there being so many other exciting things happening, I never got around to doing this - despite the exceedingly late nights ;-) Somewhat belatedly then, here is the `corrected' algorithm currently used in the JCSP library ... `correct', that is, unless Jeremy's FDR check tells us otherwise!!! Note: the `corrections' to the original version are: o removing the global synchronized from the Alternative.select method and imposing it only around the ALT WAIT operation; o making the Channel.disable synchronized. That may *sound* simple but it took over 4 days full-time thinking ... :-(. There were numerous ever-more-complex-and-desperate attempts to secure the race hazard before this relatively simple `solution' was realised. In fact, this new algorithm is closer to the transputer's micro-coded ALT instructions than the original :-) ... One point about this new algorithm: during the ENABLE sequence, there can be parallel *writes* to the same variable. The ALTing process may write `ready' to the `state' variable (currently holding `enabling') and so might a process writing to a channel that had been enabled earlier in the sequence. The latter write is synchronized on the `alt' object (see the `schedule' method) but the former is not - so both could commit to doing this. That could be prevented by synchronising the ALTing process' write on the `alt' object as well. But I don't believe that is necessary - parallel writes of the same value ought to be safe ... I could explain the original race hazard that lead to deadlock and why this new version avoids that ... but no time now. I have a fairly comfortable feeling, based on informal reasoning, that it is now correct ... but I had that before with the old one! Anyway, it passes the prolonged stress tests that broke the original. Hopefully, Jeremy will now be able to verify it. This logic is similar to the things Kevin Vella had to get to grips with for the SMP version of KRoC occam. It's very hard. Anyone working directly with Java threads at the level of synchronized/wait/notify *has* to get involved with this stuff ... and the complexity seems to grow exponentially with the number of synchronized/wait/notifies that are mutually dependent. I'm hoping not to have to do that any more ;-) ... Cheers, Peter. PS. Jeremy: note that the enclosed Channel is actually a many-1 channel. It may simplify things, initially, to make it a 1-1 channel by removing the write_monitor. PPS. Jeremy: note also that the `read' method below dispenses with the `local' variable used in the paranoia section on page 274 of WoTUG-21. Because I now know that JVMs do *not* release the object lock after invoking a notify (which I find a little peculiar), the concern expressed in the second paragraph on that page disappears and we can get rid of `local'. The CSP model of Java notify reflects this, so you should be able to re-verify the 1-1/many-1 channels with that simplification :-) The first paragraph, though, still describes a real problem - so we still need the extra notify/wait in the case when the reader is first to the channel. It would be interesting to apply FDR to the code on page 273 and see it fail ... PPS. Jeremy: please hold off verifying this ALT until I've replied to your posting from yesterday! I want to argue some more about the merits of the LEFT-only model for a JCSPCHANNEL versus the LEFT-RIGHT one. ============================================================================== public class Alternative { private static final int enabling = 0; private static final int waiting = 1; private static final int ready = 2; private static final int inactive = 3; private int state = inactive; public int select (Channel[] c) throws InterruptedException { int selected = -999999; // this value should *never* be returned! int i; state = enabling; // ALT START for (i = 0; i < c.length; i++) { if (c[i].enable (this)) { // ENABLE CHANNEL state = ready; selected = i; break; } } synchronized (this) { if (state == enabling) { // ALT WAIT state = waiting; wait (); state = ready; } } // assert : state == ready for (i--; i >= 0; i--) { if (c[i].disable ()) { // DISABLE CHANNEL selected = i; } } state = inactive; return selected; // ALT END } synchronized void schedule () { switch (state) { case enabling: state = ready; break; case waiting: state = ready; notify (); break; // case ready: case inactive: // break } } } /////////////////////////////////////////////////////////////////////////// public class Channel { private int channel_hold; // buffer (not detectable to users) private boolean channel_empty = true; // synchronisation flag private Object write_monitor = // all writers multiplex through this new Object (); private Alternative alt; // state of reader public synchronized int read () throws InterruptedException { if (channel_empty) { channel_empty = false; // first to the rendezvous wait (); // wait for the writer thread notify (); // schedule the writer to finish } else { channel_empty = true; // second to the rendezvous notify (); // schedule the waiting writer thread } return channel_hold; } public void write (int n) throws InterruptedException { synchronized (write_monitor) { synchronized (this) { channel_hold = n; if (channel_empty) { channel_empty = false; // first to the rendezvous if (alt != null) { // the reader is ALTing on this Channel alt.schedule (); } wait (); // wait for the reader thread } else { channel_empty = true; // second to the rendezvous notify (); // schedule the waiting reader thread wait (); // let the reader regain this monitor } } } } synchronized boolean enable (Alternative alt) { if (channel_empty) { this.alt = alt; return false; } else { return true; } } synchronized boolean disable () { alt = null; return !channel_empty; } }