Regular Expressions and Automata

Regular expressions are patterns used to describe the lexical parts of languages, such as numbers and identifiers. Strings matching these expressions can be detected by non-deterministic finite automata (NFAs), which can be transformed to (more efficiently implementable) deterministic finite automata (DFAs) and indeed optimal forms of DFA.

A description of the material can be found in the following paper.

The paper begins with definitions of regular expressions, and how strings are matched to them; this also gives our first Haskell treatment also. After describing the abstract data type of sets we define non-deterministic finite automata, and their implementation in Haskell. We then show how to build an NFA corresponding to each regular expression, and how such a machine can be optimised, first by transforming it into a deterministic machine, and then by minimising the state space of the DFA. We conclude with a discussion of regular definitions, and show how recognisers for strings matching regular definitions can be built.

The material gives an illustration of many of the features of Haskell, including polymorphism (the states of an NFA can be represented by objects of any type); modularisation (the system is split across a number of modules); higher-order functions (used in finding limits of processes, for example); and type classes amongst other features.

The Haskell libraries implementing the material can be found in the file archive below.

Thanks very much to Oliver Salazar for finding a bug in the implementation.

Last modified 11 April 2003.