All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.transformers.int_daisy_unicodeNormalizer.html Maven / Gradle / Ivy

The newest version!





int_daisy_unicodeNormalizer



Transformer documentation: int_daisy_unicodeNormalizer

Transformer Purpose

Performs unicode normalization on all XML documents in a fileset using one of the four standard normalization forms provided by the Unicode Consortium.

For more information on the reasons for and practice of Unicode normalization, see:

Input Requirements

The transformer is written to work on any file/fileset that can be represented by the org.daisy.util.fileset package.

Normalization will only be done on XML members of the input fileset; all other types of members pass through untouched.

If no file in the fileset is of type XML, then the whole fileset will pass through untouched. It is therefore safe to place this transformer in contexts whose dataflow varies considerably.

Output

On success

A file/fileset whose XML members has been normalized using one of the four Unicode normalization algorithms. See parameters

On error

No specific recovery scheme. On error, this transformer will send a fatal message, then throw an exception and abort.

Configuration/Customization

Parameters (tdf)

input
pathspec of the manifest member of input fileset
output
pathspec of output directory
textnodesOnly
If valued true, will only normalize element text nodes (and not attribute values, and other types of valuecarrying nodes). Default: false.
normalizationForm

Selects normalization form to use. Allowed values: NFD|NFKD|NFC|NFKC. Default: NFC, which is the one recommended in Character Model for the World Wide Web.

Extended configurability

None.

Further development

No known refactoring wishes at the time of writing.

Dependencies

  • IBM icu4j (at time of writing: icu4j_3_4_4.jar)

Author

Markus Gylling, Daisy Consortium

Licensing

LGPL





© 2015 - 2025 Weber Informatics LLC | Privacy Policy