CSP for Java
(JCSP) 1.1-rc4

org.jcsp.lang
Class Sequence

java.lang.Object
  extended by org.jcsp.lang.Sequence
All Implemented Interfaces:
CSProcess

public class Sequence
extends Object
implements CSProcess

This constructor taks an array of CSProcesses and returns a CSProcess that is the sequential composition of its process arguments.

Shortcut to the Constructor and Method Summaries.

Description

The Sequence constructor taks an array of CSProcesses and returns a CSProcess that is the sequential composition of its process arguments.

Note: for those familiar with the occam programming language, the Sequence class gives the semantics of the SEQ construct.

This class is included for completeness. Sequential code is normally be handled by sequential Java code. If we need to run some CSProcesses in sequence, we can just execute their run methods in sequence. However, using this Sequence class lets us switch between sequential and parallel composition of processes with a single word change (replace Sequence by Parallel) in our code. For some applications (e.g. the example below), this may provide a useful comparison.

CSProcesses can be added to a Sequence object either via the constructor or the addProcess methods. If a call to addProcess is made while the run method is executing, the extra process(es) will not be executed until the next time run is invoked.

CSProcesses can be removed from Sequence object via the removeProcess or removeAllProcesses method. If a call to removeProcess or removeAllProcesses is made while the run method is executing, the process will not be removed from the network until the next time run is invoked.

Example - Matrix Multiply (in Sequence and in Parallel)

The following example illustrates a simple transformation of sequential code into parallel code. On a shared-memory parallel machine (e.g. a Symmetric Multi-Processor) and depending on the overheads for starting up and shutting down the concurrency and the scale of the problem to which it is applied, the parallel code will terminate earlier than its sequential equivalent.

Sequential Implementation

The example is simple matrix multiplication. The standard implementation consists of three nested for-loops, the innermost of which computes the dot-product of a row (of the first matrix) against a column (of the second) to produce one element of the result matrix. It is expressed here as a static method, which would belong to some larger collection of static methods providing a general matrix handling class:
   public static void multiply (final double[][] X,
                                final double[][] Y,
                                final double[][] Z) {
 
     ...  check (X[0].length == Y.length)
     ...  check (X.length == Z.length)
     ...  check (Y[0].length == Z[0].length)
 
     for (int i = 0; i < X.length; i++) {
       final double[] Xi = X[i];
       final double[] Zi = Z[i];
       for (int j = 0; j < Y[0].length; j++) {
         double sum = 0.0d;
         for (int k = 0; k < Y.length; k++) {
           sum += Xi[k]*Y[k][j];
         }
         Zi[j] = sum;
       }
     }
 
   }
 
The method computes the matrix multiplication of X by Y, leaving the result in Z. The (hidden) checks ensure that the supplied arrays have compatible dimensions for multiplication, throwing a
java.lang.RuntimeException if they are not. The abbreviations Xi and Zi are not strictly necessary, but for those brought up in an occam world are automatic good practice - the Xi[k] reference in the O(n3)-innermost loop saves an array index computation and check.

Parallel Implementation

The two outer for-loops of multiply set up the particular row and column pairs for the dot-product computed by the innermost loop. Unnecessarilly, they impose an ordering in which those dot-products are computed. Each dot-product computes one element of the result matrix and is independent of its sibling dot-products. The sequential constraint may, therefore, be removed and all elements of the result matrix may be computed in parallel.

If this were occam, all we would need to do is replace the two outer for-loops with par-loops. Since this is Java, we have to jump through some slightly higher syntactic hoops. First, we need to transform the body of the for-loop into a process and use the loop to assign each one to consecutive elements of a process array. Then, we simply run the array in parallel:

   public static void parMultiply (final double[][] X,
                                   final double[][] Y,
                                   final double[][] Z) {
 
     ...  check array dimensions are compatible
 
     final CSProcess[] rowProcess = new CSProcess[X.length];
 
     for (int i = 0; i < X.length; i++) {
       final int ii = i;
       rowProcess[ii] = new CSProcess () {
         public void run () {
           final double[] Xi = X[ii];
           final double[] Zi = Z[ii];
           final double[][] YY = Y;
           for (int j = 0; j < YY[0].length; j++) {
             double sum = 0.0d;
             for (int k = 0; k < YY.length; k++) {
               sum += Xi[k]*YY[k][j];
             }
             Zi[j] = sum;
           }
         }
       };
     }
 
     new Parallel (rowProcess).run ();
 
   }
 
[Note: anonymous inner classes, as in the above code, may only access global variables that are qualified as final - hence, the need to freeze the loop control index, i, as ii. The abreviation YY, of Y, is so that access to that matrix is via a variable local to the process. The global Y could have been used, but at the cost of needless run-time overhead.]

Back to Sequential Implementation

Having expressed this algorithm in terms of processes, we come to the motivation for this example. To run a collection of processes, we must decide whether to run them is parallel or sequence. JCSP makes either decision equally easy to implement. The above coding ran them in parallel. To revert to running them is sequence, all we need do is change the last line of executable code to:
     new Sequence (rowProcess).run ();
 
It would probably be good to name this new method seqMultiply (instead or parMultiply!). Of course, it has the same behaviour as the original sequential multiply and suffers only slightly increased overheads - negligible for matrices above a modest size.

Sequence versus Parallel

Beware that it is only in certain circumstances that Sequence and Parallel can be interchanged without upsetting the semantics of the system. The processes under their control must be finite (i.e. they must all terminate), they must engage in no synchronisations (e.g. channel communication) either between themselves or with other processes and they must obey CREW-parallel usage rules (i.e. if one process updates some data, no other process may look at it). These are fulfilled by the rowProcess array, each of whose processes computes a separate row (Zi) of the target result matrix (Z).

Exercise

The above parMultiply has transformed only the outermost for-loop of multiply into a par-loop. Current SMPs do not scale well above 4 or 8 processors, so we do not need very large matrices for this code to demonstrate high parallel efficiencies. Future architectures, however, may offer much higher levels of physical concurrency that efficiently support much finer granularities of process. For those machines and as an interesting exercise, parallelise the middle for-loop. To exploit even finer levels of parallel granularity, parallelise the innermost for-loop so that it computes its dot-product in O(log(n)) time - rather than the O(n) time of the sequential code.

Two Important Modifications on parMultiply

The first time it is run, the JCSP Parallel process creates a new Thread for all but one of its component processes, running its last component in the thread invoking the run. When (and if) all those component processes terminate, the Parallel run terminates but leaves the newly created threads parked for later use. Subsequent runs of the Parallel process resuse those parked threads. So the overhead of thread creation occurs only for the first run.

In parMultiply, however, the Parallel process is anonymous and local to the method and so can never be used again. Worse, the threads it created and parked are left parked - wasting space for the remainder of the program. Repeated invocations of parMultiply, therefore, will leak memory.

There are two ways to fix this. The first is to accept that the Parallel process created by parMultiply remains local to it (and, hence, un-reusable) and, explicitly, to unpark and terminate its unwanted threads. This can be done by invoking releaseAllThreads on the Parallel after it has been run - the memory associated with those threads will be released. A temporary name for the process needs to be assigned and the last executable line of parMultiply becomes the three lines:

     final Parallel par = new Parallel (rowProcess);
     par.run ();
     par.releaseAllThreads ();
 
The second way to improve things is to save the Parallel process for later runs. There is no easy way for doing this local to the class to which parMultiply belongs. parMultiply is static and may be invoked with matrix parameters of different dimensions. So, a static data structure would need to be set up and consulted each time parMultiply was invoked to see if a suitable Parallel process had already been saved.

Much better is to give the user of parMultiply the responsibilty of looking after - and directly using, reusing and (if necessary) terminating the threads in - the Parallel process. So, we change the method into something that manufactures a process that does parallel matrix multiplication. That multiplication will be specific to the three matrices given as parameters (i.e. it performs Z = X*Y on those matrices only):

   public static Parallel makeParMultiply (final double[][] X,
                                           final double[][] Y,
                                           final double[][] Z) {
 
     ...  check array dimensions are compatible                     // as before
 
     final CSProcess[] rowProcess = new CSProcess[X.length];        // as before
 
     for (int i = 0; i < X.length; i++) {                           // as before
       final int ii = i;                                            // as before
       rowProcess[ii] = new CSProcess () {                          // as before
         public void run () {                                       // as before
           final double[] Xi = X[ii];                               // as before
           final double[] Zi = Z[ii];                               // as before
           final double[][] YY = Y;                                 // as before
           for (int j = 0; j < YY[0].length; j++) {                 // as before
             double sum = 0.0d;                                     // as before
             for (int k = 0; k < YY.length; k++) {                  // as before
               sum += Xi[k]*YY[k][j];                               // as before
             }                                                      // as before
             Zi[j] = sum;                                           // as before
           }                                                        // as before
         }                                                          // as before
       };                                                           // as before
     }                                                              // as before
 
     return new Parallel (rowProcess);
 
   }
 
The invoker of makeParMultiply gets back a parallel process that can be invoked as often as needed and incurs a thread creation overhead only on its first use. The data for the matrix parameters may be freely changed between invocations - each invocation will work on whatever data is present at the time. All the user must remember is that the manufactured parallel process is good only for working with the matrix objects specified originally to makeParMultiply as its parameters. For example, if Matrix is the class to which makeParMultiply belongs:
     final Parallel par = Matrix.makeParMultiply (X, Y, Z);
 
     while (...) {
       ...  set up matrices X and Y
       par.run ();
       ...  do something with the result matrix Z
     }
 
     par.releaseAllThreads ();
 

Author:
P.D. Austin, P.H. Welch
See Also:
CSProcess, Parallel

Constructor Summary
Sequence()
          Construct a new Sequence object initially without any processes.
Sequence(CSProcess[] processes)
          Construct a new Sequence object with the processes specified.
 
Method Summary
 void addProcess(CSProcess process)
          Add the process to the Sequence object.
 void addProcess(CSProcess[] newProcesses)
          Add the array of processes to the Sequence object.
 void removeAllProcesses()
          Remove all processes from the Sequence object.
 void removeProcess(CSProcess process)
          Remove a process from the Sequence object.
 void run()
          Run the sequential composition of the processes registered with this Sequence object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Sequence

public Sequence()
Construct a new Sequence object initially without any processes.


Sequence

public Sequence(CSProcess[] processes)
Construct a new Sequence object with the processes specified.

Parameters:
processes - the processes to be executed in sequence
Method Detail

addProcess

public void addProcess(CSProcess process)
Add the process to the Sequence object. The extended network will be executed the next time run() is invoked.

Parameters:
process - The CSProcess to be added

addProcess

public void addProcess(CSProcess[] newProcesses)
Add the array of processes to the Sequence object. The extended network will be executed the next time run() is invoked.

Parameters:
newProcesses - the processes to be added

removeProcess

public void removeProcess(CSProcess process)
Remove a process from the Sequence object. The cut-down network will be executed the next time run() is invoked.

Parameters:
process - the process to be removed

removeAllProcesses

public void removeAllProcesses()
Remove all processes from the Sequence object. The cut-down network will not be executed until the next time run() is invoked.


run

public void run()
Run the sequential composition of the processes registered with this Sequence object.

Specified by:
run in interface CSProcess

CSP for Java
(JCSP) 1.1-rc4

Submit a bug or feature to jcsp-team@kent.ac.uk
Version 1.1-rc4 of the JCSP API Specification (Copyright 1997-2008 P.D.Austin and P.H.Welch - All Rights Reserved)
Java is a trademark or registered trademark of Sun Microsystems, Inc. in the US and other countries.