All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.developer.transformer-authoring-guide.html Maven / Gradle / Ivy

The newest version!




  
  Transformer authoring guide
  



  

Daisy Pipeline Transformer authoring guide

This document summarizes the process of authoring and deploying a Pipeline Transformer.

Target audience: developers

Most developers who have created their own Transformers and/or Scripts report that they have been able to do so autodidactically by looking at previously existing Transformer and Script code. You are still recommended to read this guide, as it reveals some details that may not be immediately evident.

Markus Gylling

Latest update: 2007-11-23

Transformer development step-by-step guide

The typical steps taken (chronologically) during development of a Pipeline Transformer are:

  1. Talk about your idea (make sure you are not duplicating effort)
  2. Consider the distribution and availability model
  3. Start development
  4. Complete development, abiding to the code of conduct
  5. Perform Real-world testing
  6. Refine (if needed) and finalize the Script
  7. Document
  8. Deploy

Talk about your idea (make sure you are not duplicating effort)

You need a Transformer to perform a particular task. Before getting into development of it, make sure to talk about your plans on the Pipeline lists ([email protected] and [email protected]) first. It may be that someone else is thinking about or actually already developing something similar to what you want.

Consider the distribution and availability model

The Transformer you are about to develop may be of interest to your own organisation only, or it may be of interest to many. Similarly, there may be restrictions on availability and licensing, and there may be commercial incentives involved in the development. Remember that a Transformer can be:

  • Freely available, open source
  • Freely available, closed source
  • Only available within a particular organisation
  • Commercially available

All these cases are perfectly fine, and will obviously have an impact on how your Transformer code and binaries are hosted and distributed. Transformers that are intended to be freely and globally available under an open source license are typically hosted at the Pipeline SourceForge SVN repository.

However, you can also choose to have the code somewhere else, and provide your Transformer (and associated Script, see below) to users for example via the Pipeline GUI import feature.

If you are developing a Transformer that is to be used within your own organisation only, or if you are developing a commercial Transformer, you will not be using the Pipeline SourceForge SVN repository. The way you host the code is up to you, and in the case where your distribution model is not based on the Pipeline GUI import feature, you might want to build and maintain a separate version of the Pipeline Ant Build Script that combines the different source repositories.

Start development

If your discussions with the developer community did not reveal any risk of duplication of effort, the first step to get developing is to get a running version of the Pipeline Core. A separate document describes the setup process. You can get a version of the RCP GUI running as well, although this is not needed for development: typically the CommandLineUI class or the test framework is used during development.

Understand and define the Transformer Contract

Look at a Pipeline Transformer as a physical manifestation of the principle of singular-task encapsulation. The principle is that if a particular atomic task can be reused in different contexts, then that singular task should be the only thing the Transformer does. The result is that the same Transformer can potentially be reused by different scripts. Avoid compound tasks in Transformers. If you need several tasks performed to achieve a desired end result, consider writing several transformers instead.

Have a look at the Narrator and OPS Creator scripts for examples of how a sequence of singular-task Transformers interact to create an end result (look at the "list of Transformers used" at the end of these documents).

For reusability and maintenance purposes, the Transformer contract needs to be clearly defined, see Transformer Description File and Documentation below. Think of the Transformer contract as an interface: the underlying implementation can vary, but once published the interface (contract) is non-changing.

Choose the most appropriate coding language

To maximize portability of functionality beyond the Pipeline context, we have the following priority order when it comes to the choice of language to use to execute the Transformer contract:

  1. XSLT. If you can achieve your task in plain XSLT (1.0 or 2.0), then that is optimal. But dont push the boundaries of XSLT just to achieve this (an overly extended or complex XSLT is difficult to maintain and often processor-dependent).
  2. XSLT+minimal Java. Distribute the execution between XSLT and Java. Keep the amount of Java minimal, as to encourage people to reuse the XSLT and just swiftly port the Java parts to whatever language they want to use. (Why would someone not want to use Java? Afraid of getting hooked ?
  3. Java If XSLT doesnt cover your needs, use Java.
  4. Any other language If Java doesnt cover your needs, or you simply want to use something else, you can do that.
    Use either the org.daisy.pipeline.transformers.ExeRunner class or a simple Java wrapper in your transformer directory to execute an external process.
    The backsides here are that:
    • Many dependencies and additional executables or runtimes creates setup and deployment complexity
    • You will not get access to the Pipeline EventBus, Localization framework and other handy features.
    There are already Transformers available that use Python, Perl, and TCL in this way. Remember to maintain cross-platform support when doing this! You can write a transformer that runs only on a particular operating system (and you will also declare this in the TDF), but please avoid it.

In some cases (such as then the whole Transformer contract can be covered with plain vanilla XSLT implementation), the generic Transformer executors available in the org.daisy.pipeline.transformers package suffice to execute the Transformer. In these cases, no local Java subclass is needed. (At the time of writing, the Transformers se_tpb_dtbook2latex and dk_dbb_dtbook2rtf are examples of genericized XSLT execution.)

A recommendation is to browse through the existing Transformer collection on the SVN to get a perspective on how different types of problems have been solved before.

Create a Transformer shell, and a Transformer Description File (TDF)

On the SVN, the code and binaries of each Transformer live in a subdirectory of the trunk/dmfc/transformers/ directory.

Transformer Naming

Transformer identity is expressed as a three-part underscore separated string, where the first part is a country code, the second part is an organizational acronym, and the third part a local name.

The sole purpose of this scheme is to achieve uniqueness in a given set of Transformers.

Examples: int_daisy_validator, uk_rnib_odf2dtbook, int_org_example

When the Transformer is distributed as a JAR, the identity is expressed in the JAR file name (int_org_example.jar). When the transformer is not JARred, the identity is expressed in the name of the filesystem directory in which the Transformers files reside (/transformers/int_org_example/*).

Create a Transformer Description File

Each Daisy Pipeline Transformer has a Transformer Description File ('TDF') associated with it.

The TDF file can be seen as a manifest of the transformer; in this file, the contract of the Transformer is defined. The contract includes what type of content the Transformer accepts as input, and what type of content it will give as output. Further, the TDF defines additional input parameters that the Transformer can take to customize its behavior.

The TDF is an XML grammar. A compound RelaxNG+Schematron Schema1 exists to validate it. Each TDF file is parsed (and validated) by the Pipeline core at initialization time.

The filename of the TDF files is fixed to transformer.tdf. Previously, the filename restriction was *.tdf, but this has been deprecated in order to support locating TDF files inside JARs.

Create a Script to run the Transformer

Transformers are not executable in themselves; they need a Script to be run. A Script combines one or several Transformers into a sequence. A script can also take parameters.

See Daisy Pipeline Taskscript Grammar version 2.0 for details.

On the SVN, the Scripts live in a subdirectory of the trunk/dmfc/scripts/ directory. While under development, you should use the /scripts/_dev/ directory for your script to clearly mark that this is not ready for primetime.

Create a PipelineTest testcase to run the Script

An efficient way to develop in the Pipeline is to do so against one or several test cases.

  • Locate sample input data (and, if your work is to be publicly available, put that in the trunk/dmfc/samples/ directory).
  • Create an extension of the abstract org.daisy.pipeline.test.PipelineTest class and place it in the org.daisy.pipeline.test.impl package. The simplest way to do this is to clone one of the existing classes in the org.daisy.pipeline.test.impl package, give it a meaningful new name, and then modify the supportsScript method so that it matches your script name, and the getParameters() method so that it matches your script parameters.
  • Import and add a your new test class in org.daisy.pipeline.test.PipelineTestDriver. Disable all other tests by commenting them out.
  • If using Eclipse, add a Run Profile for org.daisy.pipeline.test.PipelineTestDriver using the parameters '${project_loc}/samples ${project_loc}/scripts'. (More info in the javadoc of PipelineTestDriver.)

This Run Profile can now be used throughout your development process, and the test case can be reused later on when doing automated stability tests of a Pipeline disiribution.

Complete development, abiding to the code of conduct

Now, the time has come to keep coding until the defined Transformer contract is fulfilled.

Please refer to the Transformer Authors Code of Conduct for details on coding style and other rules and recommendations that apply.

Notes on the use of Libraries

Apart from the obvious JRE runtime classes, you will most probably be using several libraries in your code.

In the /lib/ directory, the third party libraries that are currently used by the Pipeline core or other Transformers are available. If you need to add another third party library to get the job done, consult with a project admin (or post to the daisymfc-developer list) first. We are trying to keep the distribution size down, and therefore need to be restrictive on new lib additions where possible.

Similarly, the org.daisy.util package has been created to contain a bunch of reusable services to Transformer developers. Actually, org.daisy.util does not contain Pipeline-specific code, but is a utility library that can be used in any Java-based project. A recommendation is to go through the org.daisy.util package carefully before commencing with coding, to make sure you dont end up developing services that are already available in there.

Perform Real-world testing

If you are targeting desktop usage, you can create a separate test distribution of your Script and Transformer(s) and have testers import this into the Pipeline GUI for testing and evaluation.

Sometimes, you will also be able to distribute your work for testing to the public via an official Pipeline release. In this case, the Script is clearly marked as beta, and the release notes indicate the input and bugreports are welcome.

Refine (if needed) and finalize the Script

The exposure to users may have revealed that more runtime behavior modification of the Script is needed. Now is the time to add any such Script parameters, and implement support for those locally in the Transformer.

Document

Each Transformer has an XHTML file providing development oriented documentation. These files live in the /doc/transformers/ directory. A template exists in the /doc/templates directory.

Each Script has an XHTML file providing usage oriented documentation. These files live in the /doc/scripts/ directory. A template exists in the /doc/templates directory. Remember that Script documentation needs to be end-user oriented, as opposed to Transformer documentation.

Deploy

If your Transformer and Script are on the Pipeline SVN and should provided in an official Pipeline release, all you need to do is to make sure that the Ant Build Script is not excluding your packages, and to make sure that your Script is moved out of the _dev script directory to a more appropriate location (one of the sibling dirs to the dev dit).

An alternative channel for distributing binaries to desktop users is to use the Pipeline GUI Import feature.

Transformer Authors Code of Conduct

Inter-transformer dependencies

To avoid a dependency nightmare, it is strictly forbidden to use classes, XSLTs, etcetera from other Transformer packages. A Transformer needs to be written completely unaware of what other Transformers exists around it. As a Transformer developer you can help enforce this by reducing class visibility to package level.

If a certain function or service ends up being reimplemented by many transformers, the typical solution to avoid code duplication is to move it to the org.daisy.util package. Consult with a project admin (or post to the daisymfc-developer list) if you have candidate code for inclusion there.

Coding style

See separate document Java coding conventions.

Contributing to the test case collection

It is strongly recommended that your delivered transformer includes an implementation of the org.daisy.pipeline.test.PipelineTest class, coupled with appropriate input sample(s), placed in the svn: /samples/*/ directory.

Tests live in the org.daisy.pipeline.test.impl package.

Localization

Transformer authors are expected to externalize all strings that carry information intended for consumption by users.

For this reason, each transformer may include an external message bundle for localization. This bundle must comply with one of the the formats defined by java.util.Properties.

The message bundle must reside in the same directory/at the same package level as the local Transformer subclass and/or the Transformer Description File.

The base message bundle name must be in the form identity.messages, where identity is the three-part package name discussed above. Example: int_org_example.messages.

Localized bundles follow the country code append convention (int_org_example_fr.messages) as defined by Java ResourceBundle.

The abstract Transformer superclass already has a generic message bundle registered (org.daisy.pipeline.core/pipeline.messages). In some instances, it may be enough to utilize the messages in the generic bundle, in which case no local message bundle is needed.

Transformer authors are encouraged to use the XML form of Java properties (see http://java.sun.com/dtd/properties.dtd).

Jar packaging

We want transformers to be deployable as JAR files. Here are the ground rules for creating jarifiable transformers:

TDF file name
The name of the .tdf file must be transformer.tdf in order to be found inside the JAR file.
Do not use getTransformerDirectory()
Calling (the deprecated method) getTransformerDirectory() when running a transformer from a JAR file will throw an IllegalStateException.
Use URL instead of File to reference a local resource
To reference any resource within the Transformer package, use java.net.URL, and not java.io.File

Notes

1. The TDF schema lives in the org.daisy.pipeline.core.transformer package. If you do not have local access to the source code, browse the SVN online.





© 2015 - 2025 Weber Informatics LLC | Privacy Policy