doc.developer.transformer-authoring-guide.html Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of pipeline1-adapter Show documentation
The newest version!




  
  Transformer authoring guide
  



  Daisy Pipeline Transformer authoring guide
  
		This document summarizes the process of authoring and deploying a Pipeline Transformer.
		Target audience: developers
		Most developers who have created their own Transformers and/or Scripts report
		that they have been able to do so autodidactically by looking at previously existing Transformer 
		and Script code. You are still recommended to read this guide, as it reveals some details that
		may not be immediately evident. 				
             

  Markus Gylling
  Latest update: 2007-11-23
  
  
  	Transformer development step-by-step guide
  	Transformer Authors Code of Conduct
  
  
  Transformer development step-by-step guide
  The typical steps taken (chronologically) during development of a Pipeline Transformer are:
     
       Talk about your idea (make sure you are not duplicating effort)
       Consider the distribution and availability model
       Start development
       		
       			Understand and define the Transformer Contract
       			Choose the most appropriate coding language
       			Create a Transformer shell, and a Transformer Description File (TDF)
       			Create a Script to run the Transformer
       			Create a PipelineTest testcase to run the Script       		
       		       	
       
       Complete development, abiding to the code of conduct       		
       Perform Real-world testing
       Refine (if needed) and finalize the Script
       Document
       Deploy              
     
  
  
              
  Talk about your idea (make sure you are not duplicating effort)
  	 You need a Transformer to perform a particular task. Before getting into 
  	 development of it, make sure to talk about your plans on the Pipeline lists
  	 ([email protected] and [email protected]) first. 
  	 It may be that someone else is thinking about or actually already 
  	 developing something similar to what you want. 
  	                
  
  Consider the distribution and availability model
  	The Transformer you are about to develop may be of interest to your own 
  	organisation only, or it may be of interest to many. Similarly, there may be
  	restrictions on availability and licensing, and there may be commercial incentives
  	involved in the development. Remember that a Transformer can be:
  	
  		Freely available, open source
  		Freely available, closed source
  		Only available within a particular organisation
  		Commercially available
  	
  	
  	
  	All these cases are perfectly fine, and will obviously have an impact on how your 
  	Transformer code and binaries are hosted and distributed. Transformers that are intended to be freely 
  	and globally available under an open source license are typically hosted at the Pipeline SourceForge SVN repository. 
  	
  	However, you can also choose to have the code somewhere else, and provide your 
  	Transformer (and associated Script, see below) to users for example via the Pipeline GUI 
  	import feature.

	If you are developing a Transformer that is to be used within your own organisation
	only, or if you are developing a commercial Transformer, you will not be using the 
	Pipeline SourceForge SVN repository. The way you host the code is up to you, and in the
	case where your distribution model is not based on the Pipeline GUI import feature, you
	might want to build and maintain a separate version of the 
	
	Pipeline Ant Build Script that combines the different source repositories.  	 
    	  
  Start development
     If your discussions with the developer community did not reveal any risk of duplication
     of effort, the first step to get developing is to get a running version of the Pipeline Core. A separate 
     document describes the setup process. You can get
     a version of the RCP GUI running as well, although 
     this is not needed for development: typically the CommandLineUI class or the test framework is used during 
     development.
  
  Understand and define the Transformer Contract
     Look at a Pipeline Transformer as a physical manifestation of the principle of
     singular-task encapsulation. The principle is that if a particular atomic 
     task can be reused in different contexts, then that singular task should be the 
     only thing the Transformer does. The result is that the same Transformer can potentially be 
     reused by different scripts. Avoid compound tasks in Transformers. If you 
     need several tasks performed to achieve a desired end result, consider writing 
     several transformers instead.
     
     Have a look at the Narrator and 
     OPS Creator scripts for examples of how a sequence of
     singular-task Transformers interact to create an end result (look at the "list of Transformers used" at the end of these documents). 
      
     
     For reusability and maintenance purposes, the Transformer contract needs to be clearly defined, 
     see Transformer Description File and  Documentation below. 
     Think of the Transformer contract as an interface: the underlying implementation can vary, but 
     once published the interface (contract) is non-changing.  
  
  Choose the most appropriate coding language
     To maximize portability of functionality beyond the Pipeline context, we have the following priority order when
     it comes to the choice of language to use to execute the Transformer contract:
     
     	
            XSLT. If you can achieve your task in plain XSLT (1.0 or 2.0), then that is optimal. But dont push the 
     	boundaries of XSLT just to achieve this (an overly extended or complex XSLT is difficult to maintain and often processor-dependent).
     	
            XSLT+minimal Java. Distribute the execution between XSLT and Java. Keep the amount of Java minimal, as 
     	to encourage people to reuse the XSLT and just swiftly port the Java parts to whatever language they want to use. 
     	(Why would someone not want to use Java? Afraid of getting hooked
            ?
     	
            Java If XSLT doesnt cover your needs, use Java.
     	
            Any other language If Java doesnt cover your needs, or you simply want to use something else, you can do that.
 
     	Use either the org.daisy.pipeline.transformers.ExeRunner class or a simple Java wrapper in your transformer directory to execute an external process.

     	The backsides here are that:

     	
     	Many dependencies and additional executables or runtimes creates setup and deployment complexity
     	You will not get access to the Pipeline EventBus, Localization framework and other handy features.
     	
     	There are already Transformers available that use Python, Perl, and TCL in this way. Remember to maintain cross-platform 
     	support when doing this! You can write a transformer that runs only on a particular operating system 
     	(and you will also declare this in the TDF), but please avoid it.      	
     	
     	
         
  	 In some cases (such as then the whole Transformer contract can be covered with plain vanilla XSLT implementation), 
  	 the generic Transformer executors available in the org.daisy.pipeline.transformers package suffice to execute the Transformer. In these cases, no local Java subclass is needed. 
  	 (At the time of writing, the Transformers se_tpb_dtbook2latex and dk_dbb_dtbook2rtf are examples of genericized XSLT execution.)     
     A recommendation is to browse through the existing Transformer collection on the SVN to get a perspective on how different types of
     problems have been solved before.  
  
  
	Create a Transformer shell, and a Transformer Description File (TDF)
	    On the SVN, the code and binaries of each Transformer live in a subdirectory of the trunk/dmfc/transformers/ directory.  
    	Transformer Naming
	  		Transformer identity is expressed as a three-part underscore separated string, where the first part is a country code, the second part is an organizational acronym, and the third part a local name.
	  		The sole purpose of this scheme is to achieve uniqueness in a given set of Transformers.
	  		Examples: int_daisy_validator, uk_rnib_odf2dtbook, int_org_example
	  		When the Transformer is distributed as a JAR, the identity is expressed in the JAR file name (int_org_example.jar). When the transformer is not JARred, the identity is expressed in the name of the filesystem directory in which the Transformers files reside (/transformers/int_org_example/*).
       
  		Create a Transformer Description File
  			Each Daisy Pipeline Transformer has a Transformer Description File ('TDF') associated with it. 
  			The TDF file can be seen as a manifest of the transformer; in this file, the contract of the Transformer is defined. The contract includes what type of content the Transformer accepts as input, and what type of content it will give as output. Further, the TDF defines additional input parameters that the Transformer can take to customize its behavior.
    		The TDF is an XML grammar. A compound RelaxNG+Schematron Schema¹ exists to validate it. Each TDF file is parsed (and validated) by the Pipeline core at initialization time.  
    		The filename of the TDF files is fixed to transformer.tdf. Previously, the filename restriction was *.tdf, but this has been deprecated in order to support locating TDF files inside JARs.  
  
    
	Create a Script to run the Transformer  
		Transformers are not executable in themselves; they need a Script to be run. A Script combines one or several Transformers into a sequence. A script can also take parameters.
		See Daisy Pipeline Taskscript Grammar version 2.0 for details.
  		On the SVN, the Scripts live in a subdirectory of the trunk/dmfc/scripts/ directory. While under development, you should use the /scripts/_dev/ directory for your script to clearly mark that this is not ready for primetime.
  
  
	Create a PipelineTest testcase to run the Script  
		An efficient way to develop in the Pipeline is to do so against one or several test cases.
		
			Locate sample input data (and, if your work is to be publicly available, put that in the trunk/dmfc/samples/ directory).
			Create an extension of the abstract org.daisy.pipeline.test.PipelineTest class and place it in the org.daisy.pipeline.test.impl package. The simplest way to do this is to clone one of the existing classes in the org.daisy.pipeline.test.impl package, give it a meaningful new name, and then modify the supportsScript method so that it matches your script name, and the getParameters() method so that it matches your script parameters.
			Import and add a your new test class in org.daisy.pipeline.test.PipelineTestDriver. Disable all other tests by commenting them out.
			If using Eclipse, add a Run Profile for org.daisy.pipeline.test.PipelineTestDriver using the parameters '${project_loc}/samples ${project_loc}/scripts'. (More info in the javadoc of PipelineTestDriver.)
				
		This Run Profile can now be used throughout your development process, and the test case can be reused later on when doing automated stability tests of a Pipeline disiribution. 
		  
  Complete development, abiding to the code of conduct
	  Now, the time has come to keep coding until the defined Transformer contract is fulfilled.
	  Please refer to the Transformer Authors Code of Conduct for details on coding style and other rules and recommendations that apply.      		
       Notes on the use of Libraries
     		Apart from the obvious JRE runtime classes, you will most probably be using several libraries in your code.
     		In the /lib/ directory, the third party libraries that are currently used by the Pipeline core or other Transformers are available. If you need to add another third party library to get the job done, consult with a project admin (or post to the daisymfc-developer list) first. We are trying to keep the distribution size down, and therefore need to be restrictive on new lib additions where possible.                
     		Similarly, the org.daisy.util package has been created to contain a bunch of reusable services to Transformer developers. Actually, org.daisy.util does not contain Pipeline-specific code, but is a utility library that can be used in any Java-based project. A recommendation is to go through the org.daisy.util package carefully before commencing with coding, to make sure you dont end up developing services that are already available in there.

  Perform Real-world testing
	If you are targeting desktop usage, you can create a separate test distribution of your Script and Transformer(s) and have testers import this into the Pipeline GUI for testing and evaluation.
	Sometimes, you will also be able to distribute your work for testing to the public via an official Pipeline release. In this case, the Script is clearly marked as beta, and the release notes indicate the input and bugreports are welcome.   
  
  Refine (if needed) and finalize the Script
  The exposure to users may have revealed that more runtime behavior modification of the Script is needed. Now is the time to add any such Script parameters, and implement support for those locally in the Transformer.
  
  Document
    Each Transformer has an XHTML file providing development oriented documentation. These files live in the /doc/transformers/ directory. A template exists in the /doc/templates directory. 
  	Each Script has an XHTML file providing usage oriented documentation. These files live in the /doc/scripts/ directory. A template exists in the /doc/templates directory. Remember that Script documentation needs to be end-user oriented, as opposed to Transformer documentation.
  
  
  Deploy
  If your Transformer and Script are on the Pipeline SVN and should provided in an official Pipeline release, all you need to do is to make sure that the Ant Build Script is not excluding your packages, and to make sure that your Script is moved out of the _dev script directory to a more appropriate location (one of the sibling dirs to the dev dit).
  An alternative channel for distributing binaries to desktop users is to use the Pipeline GUI Import feature. 
  
          
  Transformer Authors Code of Conduct  

  	Inter-transformer dependencies
     	To avoid a dependency nightmare, it is strictly forbidden to use classes, XSLTs, etcetera from other Transformer packages. A Transformer needs to be written completely unaware of what other Transformers exists around it. As a Transformer developer you can help enforce this by reducing class visibility to package level.
     	If a certain function or service ends up being reimplemented by many transformers, the typical solution to avoid code duplication is to move it to the org.daisy.util package. Consult with a project admin (or post to the daisymfc-developer list) if you have candidate code for inclusion there.

 	Coding style
 		See separate document Java coding conventions.

  	Contributing to the test case collection     
		It is strongly recommended that your delivered transformer includes an implementation of the org.daisy.pipeline.test.PipelineTest class, coupled with appropriate input sample(s), placed in the svn: /samples/*/ directory.
 		Tests live in the org.daisy.pipeline.test.impl package.
  
  	Localization
		Transformer authors are expected to externalize all strings that carry information intended for consumption by users.
		For this reason, each transformer may include an external message bundle for localization. This bundle must comply with one of the the formats defined by java.util.Properties.
		The message bundle must reside in the same directory/at the same package level as the local Transformer subclass and/or the Transformer Description File.
		The base message bundle name must be in the form 
            identity.messages, where 
            identity
          is the three-part package name discussed above. Example: int_org_example.messages.
	  	Localized bundles follow the country code append convention (int_org_example_fr.messages) as defined by Java ResourceBundle.
	  	The abstract Transformer superclass already has a generic message bundle registered (org.daisy.pipeline.core/pipeline.messages). 
	  	In some instances, it may be enough to utilize the messages in the generic bundle, in which case no local message bundle is needed.
	  	Transformer authors are encouraged to use the XML form of Java properties (see http://java.sun.com/dtd/properties.dtd).
   
  	Jar packaging  
   		We want transformers to be deployable as JAR files. Here are the ground rules for creating jarifiable transformers:
  		
  			TDF file name
  			The name of the .tdf file must be transformer.tdf in order to be found inside the JAR file.
  			
  			Do not use getTransformerDirectory()
  			Calling (the deprecated method) getTransformerDirectory() when running a transformer from a JAR file will throw an IllegalStateException.
  			
  			Use URL instead of File to reference a local resource
  			To reference any resource within the Transformer package, use java.net.URL, and not java.io.File      	
  		
  
    
       
 Notes
 1. The TDF schema lives in the org.daisy.pipeline.core.transformer package. If you do not have local access to the source code, browse the SVN online.