
doc.scripts.UnicodeNormalizer.html Maven / Gradle / Ivy
The newest version!
Pipeline Script: Unicode Normalizer
Pipeline Script: Unicode Normalizer
Overview
Performs unicode normalization on all XML documents in a fileset using one
of the four standard normalization forms provided by the Unicode Consortium.
For more information on the reasons for and practice of Unicode normalization, see:
- http://www.w3.org/TR/charmod-norm/
- http://www.unicode.org/reports/tr15/
- http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/text/Normalizer.html
Configuration
- Input file
- The manifest member of the input fileset (NCC, OPF, DTBook file, etc)
- Output directory
- Directory to store the result in. Although not recommended, this can be same directory as the input directory.
- Normalization form
-
Select the normalization form to use. The default is NFC
.
See further Character Model for the World Wide Web.
- Textnodes only
-
Whether to normalize element textnodes only, and not attribute values etc
Appendix: List of Transformers used
The documents linked below are parts of the Transformer technical documentation. These are developer and systems-administrator centric documents.
© 2015 - 2025 Weber Informatics LLC | Privacy Policy