//{{{}}} Dear All, I had a nasty Java shock this morning ... but it sorted itself out by lunchtime ... maybe. Please could you help by compiling and running the enclosed Java program on any Java systems to which you have access? Please mail me the output and say what platform and Java system you used. The problem I had was some grossly unfair scheduling of some JavaPP processes. The current implementation of JavaPP shared channels (multi-writer/multi-reader) relies on the threads kernel underlying the JVM to do just one sensible thing: queue any threads trying to synchronise on an object when another thread holds the monitor for that object. So, when the monitor is released, the thread that was waiting the longest to synchronise gets in next. If we don't have this property, we have no guarantee that a thread will ever be able to synchronise with an object (even when deadlock doesn't happen). This is thread starvation. The following program tests the mechanism supported by your Java system. Copy it into a single file -- say TestQueue.java: //{{{ TestQueue.java //{{{ class X_class { class X_class { int count = 0; //{{{ public synchronized void sync (int id) { public synchronized void sync (int id) { count++; System.out.println ("(" + System.currentTimeMillis() + ")" + " Phil " + id + " : syncing ... " + count); } //}}} //{{{ public synchronized void hold (int n) { public synchronized void hold (int n) { int seconds = 1000; System.out.println ("(" + System.currentTimeMillis() + ")" + " Holder : HOLDING for " + n + " seconds ... " + count); //{{{ wait for n seconds try {Thread.sleep (n*seconds);} catch (InterruptedException e) {} //}}} } //}}} } //}}} //{{{ class Phil extends Thread { class Phil extends Thread { //{{{ //{{{ parameters private int id; private X_class X; //}}} //{{{ constructor public Phil (int id, X_class X) { this.id = id; this.X = X; start (); } //}}} //{{{ run public void run () { int seconds = 1000; //{{{ wait for id seconds try {sleep (id*seconds);} catch (InterruptedException e) {} //}}} //{{{ say want to sync System.out.println ("(" + System.currentTimeMillis() + ")" + " Phil " + id + " : want to sync ... "); //}}} X.sync (id); } //}}} //}}} } //}}} //{{{ class Holder extends Thread { class Holder extends Thread { //{{{ //{{{ parameters private X_class X; //}}} //{{{ constructor public Holder (X_class X) { this.X = X; start (); } //}}} //{{{ run public void run () { int hold_period = 15; //{{{ say want to hold System.out.println ("(" + System.currentTimeMillis() + ")" + " Holder : want to hold ... "); //}}} X.hold (hold_period); } //}}} //}}} } //}}} //{{{ class College { class College { //{{{ //{{{ main public static void main (String argv[]) { //{{{ int n_philosophers = 10; X_class X = new X_class (); Holder holder = new Holder (X); Phil[] phil = new Phil[n_philosophers]; for (int i = 0; i < n_philosophers; i++) { phil[i] = new Phil (i + 1, X); } //}}} } //}}} //}}} } //}}} //}}} Compile and run it (e.g. javac TestQueue.java; java College). Do you get the following output? //{{{ from JDK 1.1.3 on SPARC/Solaris ash% javac TestQueue.java ash% java College (870279910759) Holder : want to hold ... (870279910869) Holder : HOLDING for 15 seconds ... 0 (870279911875) Phil 1 : want to sync ... (870279912876) Phil 2 : want to sync ... (870279913876) Phil 3 : want to sync ... (870279914876) Phil 4 : want to sync ... (870279915876) Phil 5 : want to sync ... (870279916876) Phil 6 : want to sync ... (870279917876) Phil 7 : want to sync ... (870279918877) Phil 8 : want to sync ... (870279919877) Phil 9 : want to sync ... (870279920877) Phil 10 : want to sync ... (870279925878) Phil 1 : syncing ... 1 (870279925880) Phil 2 : syncing ... 2 (870279925881) Phil 3 : syncing ... 3 (870279925882) Phil 4 : syncing ... 4 (870279925883) Phil 5 : syncing ... 5 (870279925885) Phil 6 : syncing ... 6 (870279925886) Phil 7 : syncing ... 7 (870279925887) Phil 8 : syncing ... 8 (870279925889) Phil 9 : syncing ... 9 (870279925890) Phil 10 : syncing ... 10 ash% //}}} If so, all is well. The numbers down the left are timestamps in milliseconds. At second 0, Holder grabs the X monitor for 15 seconds. At seconds 1 through 10, each of the philosophers arrive wanting to sync on X. At second 15, they all make it -- in the order in which they made their calls. Now Java doesn't specify that synchronising calls are queued. Thank goodness that they implemented them that way in the latest JDK (at least on our Solaris machine). I *was* working on a SunOS machine that had an earlier JDK (1.0.?) installed. This is the output it produced from the same program: //{{{ from JDK 1.0.? on SPARC/SunOS mint.ukc.ac.uk% javac TestQueue.java mint.ukc.ac.uk% java College (870261025501) Holder : want to hold ... (870261025601) Holder : HOLDING for 15 seconds ... 0 (870261026611) Phil 1 : want to sync ... (870261027612) Phil 2 : want to sync ... (870261028612) Phil 3 : want to sync ... (870261029612) Phil 4 : want to sync ... (870261030612) Phil 5 : want to sync ... (870261031612) Phil 6 : want to sync ... (870261032613) Phil 7 : want to sync ... (870261033613) Phil 8 : want to sync ... (870261034613) Phil 9 : want to sync ... (870261035613) Phil 10 : want to sync ... (870261040616) Phil 10 : syncing ... 1 (870261040620) Phil 9 : syncing ... 2 (870261040623) Phil 8 : syncing ... 3 (870261040626) Phil 7 : syncing ... 4 (870261040629) Phil 6 : syncing ... 5 (870261040631) Phil 5 : syncing ... 6 (870261040634) Phil 4 : syncing ... 7 (870261040637) Phil 3 : syncing ... 8 (870261040640) Phil 2 : syncing ... 9 (870261040643) Phil 1 : syncing ... 10 mint.ukc.ac.uk% //}}} where it is apparent that synchronising calls are stacked! Someone mailed me once to say that some implementors wanted to do that, but I thought that was a bit far fetched. The argument was that taking the *latest* call, when a monitor got released, would lead to more efficient computation ... because the latest caller would have the greatest chance of still having its data in cache! But that's such a silly scheme as there can be no guarantee that a thread making a synchronised call will *ever* get serviced. It's trivial to dream up an example where a calling thread is infinitely overtaken by two later calling threads. It's like work stacked up on your desk. The stuff on the bottom, that came in ages ago, never gets done because it's always covered up by later arriving work! I know that leads to tears ... Anyway, I'm curious as to what happens in PC/Mac/... environments, so would be very grateful to be told. Also, if anyone knows of any rationale from Sun as to why they made this (very sensible) change in JDK 1.1.3 and whether they are they going to mandate it, please say! If the Java threads library ever went back to stacking those calls, starvation would threaten everyone -- it's not a problem peculiar to JavaPP. Note that the current implementation of JavaPP doesn't care how the Java `wait-sets' are managed (i.e. do threads `wait' in a queue or stack or ... ?). It doesn't care because it never has more than one thread in any wait-set. So ... I think we should guarantee that JavaPP processes queue up in a FIFO (or, at least, a *weak* FIFO) when waiting to read or write on a shared channel. This looks pretty basic to me! It ought to be basic for Java itself. I'm told that a reason FIFOs may not have been mandated is the difficulty of managing them across multiprocessors. But, weak FIFOs (e.g. a queue of queues) can be so managed and cheaply -- for example, see the Distributed SHARED Channel in KRoC occam: http://www.hensa.ac.uk/parallel/occam/projects/occam-for-all/dsc/ Many thanks for your help, Peter Welch.