doc.transformers.int_daisy_unicodeNormalizer.html Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of pipeline1-adapter Show documentation
The newest version!





int_daisy_unicodeNormalizer



Transformer documentation: int_daisy_unicodeNormalizer




Transformer Purpose
Input Requirements
Output
	
		On success
		On error
		
	
Configuration/Customization
	
		Parameters (tdf)
		Extended configurability
	
	
Further development
Dependencies
Author
Licensing



Transformer Purpose

Performs unicode normalization on all XML documents in a fileset using one 
of the four standard normalization forms provided by the Unicode Consortium.

For more information on the reasons for and practice of Unicode normalization, see:

	http://www.w3.org/TR/charmod-norm/
	http://www.unicode.org/reports/tr15/
	http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/text/Normalizer.html


Input Requirements

The transformer is written to work on any file/fileset that can be represented by the org.daisy.util.fileset package.
Normalization will only be done on XML members of the input fileset; all other types of members pass through untouched. 
If no file in the fileset is of type XML, then the whole fileset will pass through untouched. It is therefore safe to place this transformer in contexts whose dataflow varies considerably.

Output

On success

A file/fileset whose XML members has been normalized using one of the four Unicode normalization algorithms. See parameters

On error

No specific recovery scheme. On error, this transformer will send a fatal message, then throw an exception and abort.

Configuration/Customization

	Parameters (tdf)
	
	
		input
		pathspec of the manifest member of input fileset

		output
		pathspec of output directory

		textnodesOnly
		If valued true, will only normalize element text nodes (and not attribute values, and other types of valuecarrying nodes). Default: false.

		normalizationForm
		
			Selects normalization form to use. Allowed values: NFD|NFKD|NFC|NFKC. Default: NFC, which is the one recommended in Character Model for the World Wide Web.
		
		
	
	Extended configurability
	
	None.
	
Further development

No known refactoring wishes at the time of writing.

Dependencies


	IBM icu4j (at time of writing: icu4j_3_4_4.jar)


Author

Markus Gylling, Daisy Consortium

Licensing

LGPL