Regular Expressions and Automata

Regular expressions are patterns used to describe the lexical parts of languages, such as numbers and identifiers. Strings matching these expressions can be detected by non-deterministic finite automata (NFAs), which can be transformed to (more efficiently implementable) deterministic finite automata (DFAs) and indeed optimal forms of DFA.

A description of the material can be found in the following paper.

The paper begins with definitions of regular expressions, and how strings are matched to them; this also gives our first Miranda treatment also. After describing the abstract data type of sets we define non-deterministic finite automata, and their implementation in Miranda. We then show how to build an NFA corresponding to each regular expression, and how such a machine can be optimised, first by transforming it into a deterministic machine, and then by minimising the state space of the DFA. We conclude with a discussion of regular definitions, and show how recognisers for strings matching regular definitions can be built.

The material gives an illustration of many of the features of Miranda, including polymorphism (the states of an NFA can be represented by objects of any type); modularisation (the system is split across a number of modules); higher-order functions (used in finding limits of processes, for example) amongst other features.

The Miranda libraries implementing the material can be found in the tar files below.

To expand the tar file, type

tar xf regExp.tar

on your machine.


Written 26 April 1995.
Last revised 3 May 1995.