doc.scripts.UnicodeNormalizer.html Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of pipeline1-adapter Show documentation

The newest version!





	
	Pipeline Script: Unicode Normalizer
	


Pipeline Script: Unicode Normalizer




Overview
Configuration
Appendix: List of Transformers used



Overview

Performs unicode normalization on all XML documents in a fileset using one 
of the four standard normalization forms provided by the Unicode Consortium.

For more information on the reasons for and practice of Unicode normalization, see:

	http://www.w3.org/TR/charmod-norm/
	http://www.unicode.org/reports/tr15/
	http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/text/Normalizer.html


Configuration

	
		Input file
		The manifest member of the input fileset (NCC, OPF, DTBook file, etc)

		Output directory
		Directory to store the result in. Although not recommended, this can be same directory as the input directory.

		Normalization form
		
		Select the normalization form to use. The default is NFC. 
		See further Character Model for the World Wide Web.
		

		Textnodes only
		
			Whether to normalize element textnodes only, and not attribute values etc
				

		
Appendix: List of Transformers used
The documents linked below are parts of the Transformer technical documentation. These are developer and systems-administrator centric documents.

	Unicode Normalizer