All Downloads are FREE. Search and download functionalities are using the official Maven repository.

resources.regex-splitter.external-split-patterns.txt Maven / Gradle / Ivy

Go to download

ANNIE is a general purpose information extraction system that provides the building blocks of many other GATE applications.

There is a newer version: 9.1
Show newest version
//These are patterns for sentence splits
//
// Valentin Tablan, 24 Aug 2007
//
//
// Lines starting with // are comments; empty lines are ignored

//more than 2 new lines
(?:[\u00A0\u2007\u202F\p{javaWhitespace}&&[^\n\r]])*(\n\r|\r\n|\n|\r)(?:(?:[\u00A0\u2007\u202F\p{javaWhitespace}&&[^\n\r]])*\1)+

//the end of the document is also an external split, so that there is no
//orphaned text
\s*\z




© 2015 - 2024 Weber Informatics LLC | Privacy Policy