16

Pattern Matching

The notion of `pattern' plays an important role in Miranda.   There  are
three  contexts in which patterns can be used - in function definitions,
in  conformal  definitions,  and  on  the  left  of  the  `<-'  in  list
comprehensions.    We  first  explain  the  general  rules  for  pattern
formation, which are the same in all three contexts.

Patterns are built from variables  and  constants,  using  constructors.
Here are some simple examples.
        x
        3
        (x,y,3)
The  first pattern is just the variable x, the second is the constant 3,
the last example is built from two variables and a constant,  using  the
(,,)  constructor  for 3-tuples.  The components of a structured pattern
can themselves be arbitrary patterns, permitting  nested  structures  of
any depth.

A  pattern  can  also  contain  repeated  variables, e.g.  `(x,1,x)'.  A
pattern containing repeated variables matches  a  value  only  when  the
parts of the value corresponding to occurrences of the same variable are
equal.

The constructors which can be used in a pattern include those  of  tuple
formation  `(..,..)',  list formation `[..,..]', and the constructors of
any user defined Algebraic  Type  (see  separate  manual  section).   In
addition  there are special facilities for matching on lists and natural
numbers, as follows.

(Lists) The `:' operator can be used in patterns,  so  for  example  the
following three patterns are all equivalent (and will match any 2-list).
        a:b:[]
        a:[b]
        [a,b]
Note that `:' is right associative (see manual section on Operators).

(Natural numbers) It is permitted to write patterns of  the  form  `p+k'
where  p  is  a  pattern  and  k  is  a  literal integer constant.  This
construction will succeed in matching a value n, if and only if n is  an
integer  >=k,  and  in  this  case  p is bound to (n-k).  Example, `y+1'
matches  any   positive   integer,   and   `y'   gets   bound   to   the
integer-minus-one.

Note that the automatic coercion from integer to floating  point,  which
takes  place  in  expression  evaluation,  does  not  occur  in  pattern
matching.  An integer pattern such as `3' or `n+1' will  not  match  any
floating point number.  It is not permitted to write patterns containing
floating point constants.

Case analysis

The main use of pattern matching in Miranda is in the left hand side  of
function  definitions.  In the simplest case a pattern is used simply to
provide the right hand side of the function definition  with  names  for
subcomponents of a data structure.  For example, functions for accessing
the elements of a 2-tuple may be defined,
        fst_of_two (a,b) = a
        snd_of_two (a,b) = b

More generally  a  function  can  be  defined  by  giving  a  series  of
equations,  in which the use of different patterns on the left expresses
case analysis on the argument(s).  Some simple examples
        factorial 0 = 1
        factorial(n+1) = (n+1)*factorial n

        reverse [] = []
        reverse (a:x) = reverse x ++ [a]

        last [a] = a
        last (a:x) = last x, if x~=[]
        last [] = error "last of empty list"

Many more examples can be  found  in  the  definition  of  the  standard
environment  (see  separate manual section).  Note that pattern matching
can be combined with  the  use  of  guards  (see  last  example  above).
Patterns  in  a  case  analysis  do  not  have  to be mutually exclusive
(although as a matter of programming style that is good practice) -  the
rule  is that cases are tried in order from top to bottom, and the first
equation which `matches' is used.

Conformal definitions

Apart from the simple case where the pattern is a single  variable,  the
construction
        pattern = rhs

is called a `conformal definition'.  If the value of the right hand hand
side matches the structure of the given pattern, the  variables  on  the
left are bound to the corresponding components of the value.  Example
        [a,b,3] = [1,2,3]

defines  a  and b to have the values 1 and 2 respectively.  If the match
fails anywhere, all the  variables  on  the  left  are  undefined.   For
example, within the scope of the definition
        (x,x) = (1,2)

the  value of x is neither 1 nor 2, but undefined (i.e. an error message
will result if you try to access the value of x in any way).

As a constraint to prevent "nonsense" definitions, it is a rule that the
pattern  on the left hand side of a conformal definition must contain at
least one variable.  So e.g. `1  =  2'  is  not  a  syntactically  valid
definition.

Patterns on the left of generators

In a list comprehension (see separate manual entry) the bound entity  on
the  left  hand  side of an `<-' symbol can be any pattern.  We give two
simple examples - in both examples we assume x is a list of 2-tuples.

To  denote  a  similar  list but with the elements of each tuple swapped
over we can write
        [(b,a)|(a,b)<-x]

To extract from x all second elements of a 2-tuple whose first member is
17, we can write
        [ b |(17,b)<-x]

Irrefutable patterns (*)
 (Technical note, for people interested in denotational semantics)

DEFINITION:- an algebraic type having only one constructor and for which
that  constructor is non-nullary (ie has at least one field) is called a
product type.  The constructor of a product type is  called  a  `product
constructor'.

Each type of n-tuple (n~=0) is also defined to be a  product  type.   In
fact it should be clear that any user defined product type is isomorphic
to a tuple type.  Example,  if we define
        wibney ::= WIB num bool
then the type wibney is isomorphic to the tuple type (num,bool).

A pattern composed only of  product-constructors  and  identifiers,  and
containing  no  repeated  identifiers, is said to be "irrefutable".  For
example `WIB p q', `(x,y,z)' and `(a,(b,c))' are  irrefutable  patterns.
We show what this means by an example.  Suppose we define f, by

        f :: (num,num,bool) -> [char]
        f (x,y,z) = "bingo"

As a result of the constraints of strong typing, f can only  be  applied
to objects of type (num,num,bool) and given any actual parameter of that
type, the above equation for f MUST match.

Interestingly, this works even if the actual parameter is an  expression
which does not terminate, or contains an error.  (For example try typing
        f undef
and you will get "bingo", not an error message.)

This is because of  a  decision  about  the  denotational  semantics  of
algebraic  types  in  Miranda  -  namely  that product types (as defined
above) correspond to the domain construction DIRECT PRODUCT (as  opposed
to  lifted  product).  This means that the bottom element of a type such
as (num,num,bool) behaves indistinguishably from (bottom,bottom,bottom).

Note that singleton types such as the empty tuple type `()', or say,
        it ::= IT
are not product types under the above definition, and therefore patterns
containing  sui-generis  constants such as () or IT are not irrefutable.
This corresponds to a semantic decision that we do NOT wish to  identify
objects such as () or IT with the bottom element of their type.

For a more detailed discussion of  the  semantics  of  Miranda  see  the
formal language definition (in preparation).

------------------------------------------------------------------------
(*) A useful discussion of the semantics of pattern-matching,  including
the  issue  of  irrefutable patterns, can be found in (chapter 4 of) the
following
   S.  L.  Peyton-Jones ``The Implementation of Functional Programming
   Languages'', Prentice Hall International, March 1987.
   ISBN 0-13-453333-X