School of Computing

Regular expression matching with input compression and next state prediction.

Gerald Tripp

Technical Report 3-08, Computing Laboratory, University of Kent, October 2008.

Abstract

Automata based regular expression matching can often require large amounts of memory for its state transition tables, particularly when matching multiple complex regular expressions with the same automata. For systems with limited memory resources it is common to try to compress the state transition tables. One technique called row displacement with state marking does this by identifying default values for the next state and then packing the remaining information into a one dimensional array. Although this compression technique works well when matching multiple strings, it is not as effective when matching multiple complex regular expressions.

This paper describes a technique called next state prediction. This performs lossy compression of the current state and input values and uses these to select a likely next state from a prediction table. This is used in conjunction with a standard row displacement with state marking algorithm and leads to an overall reduction in the memory required for the various tables.

The algorithms have been tested with a number of different design parameters, and compared with a 'baseline version' where this technique is not used. When testing this system with a set of regular expressions from the Snort intrusion detection system, the memory required was around 46% of that required for the baseline version. The design has been modelled in VHDL for use within an FPGA and tested via simulation and operates at a search rate of 2.0 Gbps irrespective of the regular expressions being searched for or the input data being scanned.

Download publication 555 kbytes (PDF)

Bibtex Record

@techreport{2832,
author = {Gerald Tripp},
title = {Regular expression matching with input compression and next state prediction.},
month = {October},
year = {2008},
pages = {182-196},
keywords = {determinacy analysis, Craig interpolants},
note = {},
doi = {},
url = {http://www.cs.kent.ac.uk/pubs/2008/2832},
    publication_type = {techreport},
    submission_id = {15519_1224691269},
    institution = {Computing Laboratory, University of Kent},
    number = {3-08},
}

School of Computing, University of Kent, Canterbury, Kent, CT2 7NF

Enquiries: +44 (0)1227 824180 or contact us.

Last Updated: 21/03/2014