edu.isi.nlp.StringNormalizer Maven / Gradle / Ivy
The newest version!
package edu.isi.nlp;
/**
* A string normalizer is any strategy for mapping a sequence of characters to another sequence of
* characters, where typically the output sequence represents some equivalence class over the
* inputs.
*
* This has several uses:
*
*
* - Unicode normalization
*
- Word shape features: A typical example would be a rule like: "Map all alphabetical
* characters to A, map all digits to D, keep all other characters the same, and collapse
* adjacent repeated characters." This would map 617-873-8000 to D-D-D and attorney-general to
* A-A.
*
*/
public interface StringNormalizer {
/** Map a string to its word shape. Neither the input nor the output may be {@code null}. */
String normalize(String input);
}
© 2015 - 2025 Weber Informatics LLC | Privacy Policy