doc.transformers.se_tpb_speechgenerator.html Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of pipeline1-adapter Show documentation
The newest version!




	
	se_tpb_speechgenerator
	



Transformer documentation: se_tpb_speechgenerator




Transformer Purpose
Input Requirements
Output
	
		On success
		On error
		
	
Configuration/Customization
	
		Parameters (tdf)
		Extended configurability
		TTS Java wrappers
			
				Loquendo
			
			
	
	
Further development
Dependencies
Author
Licensing



Transformer Purpose

	Generates audio for a full-text dtbook file. The typical use is having a TTS
	system generate audio, but this is not a requirement. For example, a silent template audio file could be
	associated with each synch point in order to make an "empty" book ready for
	
		se_tpb_filesetcreator without the need for a possibly time consuming tts process.
	
	
	Regardless the audio kind, attributes will be placed on elements representing synch points.
	Those attributes are smil:clipBegin, smil:clipEnd and smil:src with
		namespace URI http://www.w3.org/2001/SMIL20/.
	
		
Input Requirements

	This transformer is written to work with a manuscript, that is a dtbook-2005-1 or dtbook-2005-2
	 document possibly enriched with elements and attributes from other namespaces.
	The input document must be "synch point normalized", see
	
		se_tpb_syncPointNormalizer
	 for such transformation.
	
	
	
		Some elements are supposed to be announced audible. Those elements must have attributes
		holding the say-before and say-after text strings. 
		
			se_tpb_annonsator
		 can be used to add those attributes to a dtbook document. Since those
		attribute names are configurable, make sure they match whatever se_tpb_annonsator
		uses.
	
	
	
	
Output

On success

	Given the expected input the transformer outputs a manuscript, that is a dtbook-2005-1/2
		document enriched with, among others, attributes indicating corresponding audio.
		Those attributes, smil:clipBegin, smil:clipEnd and smil:src, namespace URI 
		http://www.w3.org/2001/SMIL20/, point out which elements should be represented 
		by audio in the generated talking book.
		Output also includes the generated audio files referrenced by the smil-attributes. 
		
	
	sent-level synchronization should be used, although configurable. 
		Other usage has not been tested.
	

	
On error

No specific recovery scheme. On error, this transformer will send a fatal message, 
then throw an exception and abort.

	
	
Configuration/Customization

	Parameters (tdf)
	
	
	inputFilename
	required="true"

	The input manuscript file.

	Example: /home/books/manuscript.xml
	
		
	outputDirectory
	required="true"

	Path to the output directory

	Example: /home/books/audio
	
	
	outputFilename
	required="true"

	The desired name of the output manuscript.

	Example: /home/books/audio/speechgen-manuscript.xml
		
		
	concurrentAudioMerge
	required="false"

	Whether the merge of the audio should be done concurrent to the speech
		generation or not. Due to license, some TTS systems spend most of 
		their time sleeping just to avoid being too effective, Loquendo
		is an example of that. If that is the case, why not use the time
		doing something useful instead, like merging tiny audio clips? Parallel
		threads will be spawned to merge the audio.

		Possible enum values:
		
			true
			false
		
		Default: true
		
		
	mp3Output
	required="false"

		Is mp3 the preferred audio output format? The default
	option is wav.

	Possible enum values:
		
			true
			false
		
		Default: false	
	
		
	sgConfigFilename
	required="false"

	Speech generator configuration file.
	See Speech Generator Configuration for details.

	Example: /home/config/file.xml

	Default: ${transformer_dir}/config/sgConfig.xml	
	
	ttsBuilderConfig
	required="false"

	The tts builder configuration file. 
	See TTS Builder Configuration for details.

	Example: /home/ttsbfiles/file.xml

	Default: ${transformer_dir}/ttsbuilder.xml	
	
		
	ttsBuilderRNG
	required="false"

	Tests for the tts builder configuration file using relaxng with embedded schematron.

	Example: /home/ttsbfiles/file.rng

	Default: ${transformer_dir}/ttsbuilder-configtest.rng	
	
		
	

	Extended configurability
	
	Speech Generator Configuration
	The file pointed to by the tdf variable sgConfig provides the 
		possibility to affect the processing of the document. Things like 
		on which elements to synch, merge audio and so on, are configured there. 
		A description of the possibilities follows together with a short example:

	
		/sgConfig/absoluteSynch/item
		The names of the the elements that should be synch points, no matter where they are.
		
		/sgConfig/containsSynch/item
		The name of the element for synch point level.
		
		/sgConfig/announceAttributes/item
		Elements of this type show which attributes contain
			announcements. Two elements of this kind is allowed, with id values before 
			(which tells us about which attributes contains "say-before" content) and
		after (which tells us about which attributes contains "say-after" content). 
			On those elements three attributes (plus id) must be placed:
		
			uri: the namespace uri of the announce-attributes.
			prefix: the namespace prefix.
			local: the attribute's local name.
		
			The element body is empty.
		
		
		/sgConfig/mergeAudio/item
		Elements at which to divide the audio into different files. 
			The values can be seen as the element-only xpath tail.
			level/hd rather than //level/hd, that is.
		
		
		/sgConfig/silence
		There is a possibility to add extra silence after and/or before certain
		events in the talking book. Silence is added at the end of a synch point, 
		never at the beginning. In the current implementation, the duration of the desired 
		silence is provided in milliseconds and extra silence can be added
		upon five different events:
		
			afterLast:
				After the last phrase in an audio clip. Typical usage would be 
			when audio is merged at a heading, this ability would add silence 
			just before the heading.
			
			afterFirst:
				After the first phrase in an audio clip. Typical usage would be
			just after a heading.
			
			beforeAnnouncement:
				Before an audible announcement.
			
			afterAnnouncement:
				After an audible announcement.
			
			afterRegularPhrase:
				After every regular phrase that's generated.
		
		
		
	
	An example follows:
	<?xml version="1.0" encoding="utf-8"?>
<sgConfig>

	<absoluteSynch>
		<item>pagenum</item>
		<item>noteref</item>
		<item>annoref</item>
		<item>linenum</item>
	</absoluteSynch>
	
	<containsSynch>
		<item>sent</item>
	</containsSynch>
	
	<announceAttributes>
		<item id="before" uri="http://www.daisy.org/ns/pipeline/annon" prefix="annon" local="before"/>
		<item id="after" uri="http://www.daisy.org/ns/pipeline/annon" prefix="annon" local="after"/>
	</announceAttributes>
	
	<mergeAudio>
		<item>h1</item>
		<item>h2</item>
		<item>h3</item>
		<item>h4</item>
		<item>h5</item>
		<item>h6</item>
		<item>level/hd</item>
	</mergeAudio>
	
	<silence>
		<afterLast>2000</afterLast>
		<afterFirst>800</afterFirst>
		<beforeAnnouncement>300</beforeAnnouncement>
		<afterAnnouncement>300</afterAnnouncement>
		<afterRegularPhrase>200</afterRegularPhrase>
	</silence>
	
</sgConfig>
	
	TTS Builder Configuration	
	
	se_tpb_speechgenerator uses a simple factory to get hold of TTS implementations.
	The factory must be configured properly since it is not able to locate TTS systems on its own.
	The configuration consists of sections that are operating system specific. As subsections, 
		there are language specific sections. Each language must contain no more than
		one TTS system. During runtime, the TTS Builder configuration file is validated using
		relaxng and schematron, but since a DTD is a compact way of showing a document's structure, 
		here's one:
	
	
	<!DOCTYPE ttsbuilder [
   <!ELEMENT ttsbuilder (os+)>
   <!ELEMENT os (property*, lang*)>
   <!ELEMENT property (EMPTY)>
   <!ELEMENT lang (tts)>
   <!ELEMENT tts (param+)>
   <!ELEMENT param (EMPTY)>
   
   <!ATTLIST property name  CDATA #REQUIRED>
   <!ATTLIST property match CDATA #REQUIRED>
   <!ATTLIST lang lang CDATA #REQUIRED>
   <!ATTLIST tts default (true) CDATA #IMPLIED>
   <!ATTLIST param name  CDATA #REQUIRED>
   <!ATTLIST param value CDATA #REQUIRED>
]>

	
	Besides the rules expressible in a DTD, there are a few others, asserted using schematron:
	
		The length of the lang-attribute value must be 2. 
			This is to follow the ISO-639 2-letter lower-case standard	used in Java.
		
		
		
		lang-siblings don't have the same lang-attribute value.
		
		For each os, there is not more than one descendant tts which has got the
		attribute default="true" to be used as fallback.
		
		For each tts, there must not be two descendant params with
		the same value for attribute name.
		
		For each tts, there is a param with name attribute
		value=class
	
	
	
		Configuration of a TTS mainly consists of parameters for a certain TTS wrapper, such as Java class name or
		path to binary TTS program.
		Each TTS system needs its own Java-wrapper, and hence their 
		configuration can differ extensively. The wrapper communicate with the TTS system of your choice.
		The properties read from the TTS Builder Configuration are passed to the TTS Java 
		wrapper constructor (if there is one taking a java.util.Map as parameter) and from 
		there, it's up to the wrapper to decide
		what to do. This gives a developer great possibilities when it comes to creating a TTS wrapper 
		and its configuration. If the Java wrapper extends se_tpb_speechgenerator.ExternalTTS, 
		some functionality is available. By calling the void setParamMap(java.util.Map)
		the super class attempts to read the following parameters:
	
	
	
		
			generalRegexFilename 
			- general "always use"-re:s. Absolute path to file.
		
		
		
			specificRegexFilename 
			- book specific re:s. Absolute path to file.
		
		
		
			characterSubstitutionTables 
			- a comma separated list of absolute file paths to character substitution tables. 
			If this parameter is present, the program will look for the following two:
			
				
					characterExcludeFromSubstitution 
					- name of character set to exclude from substitution.
				
				
				
					characterFallbackStates
					- what to do if no mapping is found, the following values are valid:
					
						fallbackToNonSpacingMarkRemovalTransliteration
						Determines whether a character substitution attempt should fallback 
						to a transliteration to nonspacing mark removal (typically disaccentuation) 
						attempt if a replacement text was not found in user provided substitution 
						table(s).
						
						
						fallbackToLatinTransliteration
						Determines whether a character substitution attempt should fallback 
						to a transliteration to Latin attempt if a replacement text was not 
						found in user provided substitution table(s).
						
						
						fallbackToUCD
						Determines whether a character substitution attempt should fallback to 
						names in the UCD table if a replacement text is not found in user provided 
						substitution table(s).
						
					 
				
			
		
		yearFilename. Absolute path to file.
		xsltFilename. Absolute path to file.
	
	This will make calls to the following super class methods do something useful:
	
		String xsltFilter(Document) 
			-Performs an xslt on the small DOM representing the synch point.
		
		String regexFilter(String)
			-Performs regex search-replace.
		
		String yearFilter(String)
			-Performs year specific regex search-replace, replacing numerals with text.
		
		String replaceUChars(String) and void replaceUChars(Node) (recursive)
			-Replaces characters, most likely configured (by you) to filter characters your TTS can't handle.
	
	
	
	An example of the configuration follows:
<?xml version="1.0" encoding="UTF-8"?>

<!-- the Java class parameter must be supplied -->
<!-- ${transformer_dir} variable will be evaluated to the directory where se_tpb_speechgenerator resides. -->

<ttsbuilder>
	<!--******************************************************************************
	Windows
	*******************************************************************************-->
	<os>
		<!-- all properties must match java's System.getProperties()-properties.
			Standard regex match for an os to be usable in this program. -->
		<property name="os.name" match="[Ww]indows.*" />
		
		<lang lang="en">
			<!-- since xml:lang determines which tts to use when in 
			this program, provide only one tts per language! -->			
			
			<!-- this is configuration for one tts impl. the "default" attribute 
			should be set to true for one configuration for each os. -->
			<tts default="true">
				<!-- the Java class name -->
				<param name="class" value="se_tpb_speechgenerator.SAPIImpl"/>
				
				<!-- the binary SAPI-talking program used for tts conversion -->
				<param 
				    name="binary" 
				    value="${transformer_dir}/tts/SimpleCommandLineTTS/SimpleCommandLineTTS.exe"/>				
		
				<!-- an xml file containing simple search-replace regex rules. -->
				<param name="generalRegexFilename" value="${transformer_dir}/regex/richard.xml"/>
				
				<!-- book specific regexes, will be applied before "generalRegexFilename". -->
				<param 
				    name="specificRegexFilename" 
				    value="${transformer_dir}/regex/someBookSpecific-re.xml"/>
	
				<!-- xslt applied on each synchpoint -->
				<param name="xsltFilename" value="${transformer_dir}/xslt/transform.xsl"/>
	
				<!-- an xml file containing simple search-replace regex rules.
				    Those rules specifically replaces years in digits with text. -->
				<param name="yearFilename" value="${transformer_dir}/config/year_en.xml"/>
	
				<!-- SAPI specific parameter: The value will be used to embed the text in
					SAPI's xml-like way. This value will result in the following tags
					surrounding the input text: 
				<voice optional="Gender=Male"></voice>
				Where the starting point is <voice optional=""></voice>.
				More on SAPI xml codes: 
				http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp
				-->
				<param name="sapiVoiceSelection" value="Gender=Male"/>
	
				<!-- An ability to filter characters and replace them with custom strings. -->
				<param 
				    name="characterSubstitutionTables" 
				    value="${transformer_dir}/character-translation-table.xml"/>
	
				<!-- The encoding of the character translation table. -->
				<param name="characterFallbackStates" value="fallbackToLatinTransliteration"/>		
			</tts>
		</lang>
		
		<lang lang="sv">
			<tts>
				<param name="class" value="se_tpb_speechgenerator.SAPIImpl"/>
				<param name="binary" value="${transformer_dir}/tts/SimpleCommandLineTTS/SimpleCommandLineTTS.exe"/> 
				<param name="generalRegexFilename" value="${transformer_dir}/regex/richard.xml"/>
				<param name="xsltFilename" value="${transformer_dir}/xslt/transform.xsl"/>
				<param name="yearFilename" value="${transformer_dir}/config/year_se.xml"/>
				<param name="sapiVoiceSelection" value="Language=41D"/>
				<param name="characterSubstitutionTables" value="${transformer_dir}/character-translation-table.xml"/>
				<param name="characterFallbackStates" value="fallbackToLatinTransliteration"/>
			</tts>
		</lang>
	</os>
	
	
	<!--******************************************************************************
	Linux
	*******************************************************************************-->
	<os>
		<property name="os.name" match="[Ll]inux.*" />
		
		<lang lang="en">
			<tts id="loquendo" default="true">
				<param name="class" value="se_tpb_speechgenerator.LoquendoImpl"/>
				<param name="binary" value="${transformer_dir}/../../../narratorLoquendo"/>
				<param name="generalRegexFilename" value="${transformer_dir}/regex/richard.xml"/>
				<param name="ttsProperties" value="${transformer_dir}/config/loquendo.xml"/>
				<param name="xsltFilename" value="${transformer_dir}/xslt/loquendo-en.xsl"/>
				<param name="yearFilename" value="${transformer_dir}/config/year_en.xml"/>
			</tts>
		</lang>
	</os>
</ttsbuilder>

	
	
	
TTS Java wrappers
	
		If you need to use a TTS system other than SAPI, you must develop your
		own TTS Java wrapper. One way of doing that is to develop a class from scratch
		implementing se_tpb_speechgenerator.TTS. But the easiest way 
		is to extend se_tpb_speechgenerator.ExternalTTS.
		The class is abstract, leaving three methods left to implement:
	
	
		long se_tpb_speechgenerator.ExternalTTS().sayImpl(Document syncPoint, File outputFile)
		long se_tpb_speechgenerator.ExternalTTS().sayImpl(String syncPoint, File outputFile)
		void se_tpb_speechgenerator.TTS.close()
	
	The parameters configured in the TTS Builder Configuration will be passed to a constructor accepting a 
		java.util.Map as a single parameter, otherwise they will be passed to the wrapper by a call to
		se_tpb_speechgenerator.ExternalTTS.setParamMap(java.util.Map). See the javadoc for 
		more details. This lets you use the TTS system - and possible inter-process communication - of your choice. 
		Once you have set up a proper TTS Builder Configuration your new TTS wrapper is ready to run.
	
	
	Loquendo
	
		At TPB we have been using a simple Java wrapper for the Loquendo TTS Linux version. 
		Work has been made to make the TTS better for us. Some pre-processing rules have been 
		developed using regexes, and those may come in handy for anyone using 
		the SAPI version of the Loquendo TTS together with Narrator. 
		Read more about what have been done: 
		loquendo-preproc.html.
	

Further development


	Refactoring: Instead of letting se_tpb_speechgenerator figuring out which elements represent
		synch points by searching for certain element structures with text nodes, an attribute must 
		be present on those elements, making the synch point search trivial. 
		Identifying synch points should only be assigned to 
		one transformer, and a possible candidate in the Narrator transformer chain would be 
		se_tpb_syncPointNormalizer.
	
	RNG/Schematron test the Speech Generator Configuration file before running.
	Generate audio for non-empty dc:Creator and dc:Title. Since the fileset creator
	uses those elements in the absence of docAuthor and docTitle it would be nice to
	be able to supply audio as well.

	
Dependencies

May need to access some TTS system, which is not part of the Daisy Pipeline.
	
Author

Martin Blomberg, TPB.
	
Licensing

	LGPL