School of Computing

An analysis of sentence boundary detection systems for English and Portuguese documents

Carlos N. Silla Jr. and Celso A. A. Kaestner

In Fifth International Conference on Intelligent Text Processing and Computational Linguistics, volume 2945 of Lecture Notes in Computer Science, pages 182-196. Springer, February 2004 [doi].

Abstract

In this paper we present a study comparing the performance of different systems found in the literature that perform the task of automatic text segmentation in sentences for English documents. We also show the difficulties found to adapt these systems to make them work with Portuguese documents and the results obtained after the adaptation. We analyzed two systems that use a machine learning approach: MxTerminator and Satz, and a customized system based on fixed rules expressed by Regular Expressions. The results achieved by the Satz system were surprisingly positive for Portuguese documents.

Download publication 85 kbytes (PDF)

Bibtex Record

@inproceedings{2930,
author = {Carlos N. Silla Jr. and Celso A. A. Kaestner},
title = { An Analysis of Sentence Boundary Detection Systems for {E}nglish and {P}ortuguese Documents},
month = {February},
year = {2004},
pages = {182-196},
keywords = {determinacy analysis, Craig interpolants},
note = {},
doi = {10.1007/b95558},
url = {http://www.cs.kent.ac.uk/pubs/2004/2930},
    publication_type = {inproceedings},
    submission_id = {1384_1245729082},
    other_year = {2004},
    volume = {2945},
    series = {Lecture Notes in Computer Science},
    publisher = {Springer},
    refereed = {yes},
    booktitle = {Fifth International Conference on Intelligent Text Processing and Computational Linguistics},
}

School of Computing, University of Kent, Canterbury, Kent, CT2 7NF

Enquiries: +44 (0)1227 824180 or contact us.

Last Updated: 21/03/2014