
doc.scripts.Narrator-multilang.html Maven / Gradle / Ivy
Speechen2 multi-language support
Speechgen2 multi-language TTS configuration
This documentation is about per-language configuration of TTS engines in the se_tpb_speechgen2 transformer. It targets Pipeline users who want to refine the configuration via manual edition of the internal XML configuration files.
- Overview
- Structure of the ttsbuilder.xml
- Voice selection mechanism
- Configuration on Windows
- Configuration on Mac OS X
- Configuration on Linux
Overview
The Pipeline uses a set of declarative rules to associate TTS voices to language codes.
The Narrator automatically selects TTS voices depending on the value of the xml:lang attributes found in the DTBook.
If there is no specific rule for a given language, the Narrator will fall back to the default system voice.
The TTS-related configuration is actually part of the se_tpb_speechgen2 transformer.
It uses a simple factory/builder to get hold of TTS implementations, configured in an XML file
named ttsbuilder.xml
in the se_tpb_speechgen2/tts/
directory.
Structure of the ttsbuilder.xml configuration file
The configuration consists of operating-system-specific sections ; whithin each of these OS sections are
language-specific sub-sections containing the declaration of a single TTS engine to use for this language:
<ttsbuilder>
<os>
<property name="os.name" match="[Ww]indows.*" />
<lang lang="__">
<tts default="true">...</tts>
</lang>
<lang lang="en">
<tts>...</tts>
</lang>
<lang lang="fr">
<tts>...</tts>
</lang>
</os>
<os>
<property name="os.name" match="[Ll]inux.*" />
<lang lang="en">
<tts default="true">...</tts>
</lang>
...
</os>
<os>
<property name="os.name" match="Mac OS X" />
<lang lang="en">
<tts default="true">...</tts>
</lang>
...
</os>
</ttsbuilder>
For each OS, there can be one (and only one) descendant TTS with the attribute default="true"
to be used as fallback.
Note that this deault TTS can be configured in a "dummy" language section (with a fake language code), as it is done for the Windows
section in the example above.
Voice selection mechanism
When the Narrator must generate the audio for a DTBook element, it first looks at the value of the xml:lang
attribute of the element or its closest ancestor.
It then tries to instantiate a TTS engine based on the configuration in the tts
element in the language section corresponding to the xml:lang
value
and in the OS section corresponding to the user's OS. For instance if the document locale is en-US it will pick the best match in that order:
- the section with the lang attribute equals to 'en_US'
- the section with the lang attribute equals to 'en'
- the first section with the lang attribute starting with 'en_'
- the section with the default attribute set to 'true'
Note that the configuration uses underscores to separate the language and country codes as done in the java.util.Locale#toString() method
Note that this multi-language support can be disabled with the script parameter named "Multi-language support". If this option is disabled, the TTS engine configured in
the default section will always be used.
Configuration on Windows
On Windows, the actual voice selection is by default delegated to the Microsoft Speech API (SAPI5), which means that only SAPI-compliant TTS engines can be used.
The text sent to the default SAPI TTS adapter is wrapped in a voice SAPI XML tag with the selection criteria declared in the sapiVoiceSelection
parameter
of the tts
section in ttsbuilder.xml
.
For instance, if the TTS configuration contains the following section:
<lang lang="en">
<tts>
<param name="class" value="se_tpb_speechgen2.external.win.DefaultSapiTTS"/>
<param name="sapiVoiceSelection" value="Language=409"/>
...
</tts>
</lang>
The text "This is is a sentence." is transformed into the following SAPI XML tag before being sent to the TTS:
<voice optional="Language=409">This is is a sentence.</voice>
The default ttsbuilder.xml
configuration uses Microsoft language codes to select voice matching a language section, but note that it is possible to
refine the selection, with queries such as "Gender=Female;Age!=Child;Language=409
". It is even possible to explicitly name the voice to use for a
particular language section. For more information on the TTS selection attributes, refer to the Microsoft XML TTS tutorial and to the list of language codes.
Configuration on Mac OS X
On Mac OS X, the TTS engine is selected directly by the name of the voice specified in the voice
parameter of the TTS section of the language.
The parameter accepts a comma-separated list of voice names, and the first name corresponding to a voice existing on the user's system is selected.
For instance, if the TTS configuration contains the following section:
<lang lang="en">
<tts>
<param name="class" value="se_tpb_speechgen2.external.MacOS.MacSayTTS"/>
<param name="voice" value="Alex, Vicky"/>
...
</tts>
</lang>
The voice selected to speech English content will be Apple's Alex voice on Mac OS X 10.5 Leopard and later, and Apple's Vicky voice on Mac OS X 10.4 Tiger (where Alex is not available).
Configuration on Linux
On Linux, the default TTS adapter to the ESpeak engine selects the voice with a two-letter language code configured in the eSpeakVoiceFile
parameter of the
TTS section of the language.
For instance, for English the TTS configuration would be:
<lang lang="en">
<tts default="true">
<param name="class" value="se_tpb_speechgen2.external.linux.ESpeakTTS"/>
<param name="eSpeakVoiceFile" value="en"/>
...
</tts>
</lang>