All Downloads are FREE. Search and download functionalities are using the official Maven repository.

aSaxon-B-9-0-0-8sources.net.sf.saxon.charcode.package.html Maven / Gradle / Ivy

Go to download

The Apache Commons Codec package contains simple encoder and decoders for various formats such as Base64 and Hexadecimal. In addition to these widely used encoders and decoders, the codec package also maintains a collection of phonetic encoding utilities.

The newest version!



Package overview for net.sf.saxon.charcode




This package provides classes for handling different output character sets.

The sole function of these classes is to determine whether a particular character is present in the character set or not: if not, Saxon has to replace it with a character reference.

The actual translation of Unicode characters to characters in the selected encoding is left to the Java run-time library. (Note that different versions of Java support different sets of encodings, and there is no easy way to find out which encodings are supported in a given installation).

It is possible to configure Saxon to support additional character sets by writing an implementation of the PluggableCharacterSet interface, and registering this class as the value of the system property whose name is given by the expression:

OutputKeys.ENCODING + "." + encoding

where "encoding" is the name of the encoding as used in <xsl:output> - for example, iso-8859-10.

If an output encoding is requested that Saxon does not recognize, but which the Java platform does recognize, then Saxon attempts to determine which characters the encoding can represent, so that unsupported characters can be written as numeric character references. Saxon uses two approaches to doing this. (The logic for this is in the CharacterSetFactory class.) Where possible, it uses the UnknownCharacterSet class, which tests the availability of individual characters using the Java interrogative encoding.canEncode(). However, some encodings do not implement this method reliably; Saxon attempts to detect this, and represents such encodings instead using the BuggyCharacterSet class. This class attempts to encode each character, and relies on catching an exception when it fails: expensive, but it only happens once for any given character.


Michael H. Kay
Saxonica Limited
9 February 2005





© 2015 - 2024 Weber Informatics LLC | Privacy Policy