All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.transformers.se_tpb_charsetSwitcher.html Maven / Gradle / Ivy

The newest version!





se_tpb_charsetSwitcher



Transformer documentation: se_tpb_charsetSwitcher

Transformer Purpose

Switches character set on all XML files in a fileset to the, desired encoding. The transcoding is performed using XSLT, so any encoding supported by the XSLT processor (currently saxon8) should be supported. Characters that can't be represented by the output encoding will be converted to a numeric entity. For example, '√' will be converted to √ if not supported by the output character set.

It is also possible to specify if the XML documents shall use unix, dos or mac line breaks.

Input Requirements

The transformer is written to work on any file/fileset that can be represented by the org.daisy.util.fileset package.

Character set transcoding will only be done on XML members of the input fileset; all other types of members pass through untouched.

If no file in the fileset is of type XML, then the whole fileset will pass through untouched. It is therefore safe to place this transformer in contexts whose dataflow varies considerably.

Output

On success

A file/fileset whose XML members has been transcoded, and optionally has had certain characters substituted by replacement strings. See parameters

On error

No specific recovery scheme. On error, this transformer will send a fatal message, then throw an exception and abort.

Configuration/Customization

Parameters (tdf)

input
The input XML file (standalone or manifest)
output
The output directory
encoding
Character set encoding of the output file(s). If not set, utf-8 is used as default.
breaks
The type of line breaks to use in the output files. Possible values are unix, dos, mac and default. The default value is (unsurprisingly) default.

Further development

Most of the functionality of this transformer could also be performed using the int_daisy_unicodeTranscoder. This transformer can probably be deprecated when some third party packages used by the int_daisy_unicodeTranscoder become more stable.

Dependencies

Currently the Saxon8 XSLT processor is used to perform the actual transcoding.

Author

Linus Ericson, TPB

Licensing

LGPL





© 2015 - 2025 Weber Informatics LLC | Privacy Policy