
doc.transformers.se_tpb_charsetSwitcher.html Maven / Gradle / Ivy
se_tpb_charsetSwitcher
Transformer documentation: se_tpb_charsetSwitcher
Transformer Purpose
Switches character set on all XML files in a fileset to the,
desired encoding. The transcoding is performed using XSLT, so
any encoding supported by the XSLT processor (currently saxon8)
should be supported. Characters that can't be represented by the
output encoding will be converted to a numeric entity. For example,
'√' will be converted to √ if not supported by the
output character set.
It is also possible to specify if the XML documents
shall use unix, dos or mac line breaks.
Input Requirements
The transformer is written to work on any file/fileset that can
be represented by the org.daisy.util.fileset
package.
Character set transcoding will only be done on XML members of the
input fileset; all other types of members pass through untouched.
If no file in the fileset is of type XML, then the whole fileset
will pass through untouched. It is therefore safe to place this
transformer in contexts whose dataflow varies considerably.
Output
On success
A file/fileset whose XML members has been transcoded, and
optionally has had certain characters substituted by replacement
strings. See parameters
On error
No specific recovery scheme. On error, this transformer will send
a fatal message, then throw an exception and abort.
Configuration/Customization
Parameters (tdf)
- input
- The input XML file (standalone or manifest)
- output
- The output directory
- encoding
- Character set encoding of the output file(s). If not set,
utf-8
is used as default.
- breaks
- The type of line breaks to use in the output files. Possible values are
unix
,
dos
, mac
and default
. The default value is (unsurprisingly)
default
.
Further development
Most of the functionality of this transformer could also be performed using the int_daisy_unicodeTranscoder.
This transformer can probably be deprecated when some third party packages used by the int_daisy_unicodeTranscoder
become more stable.
Dependencies
Currently the Saxon8 XSLT processor is used to perform the actual transcoding.
Author
Linus Ericson, TPB
Licensing
LGPL