doc.transformers.int_daisy_validator.html Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of pipeline1-adapter Show documentation
The newest version!




	
	int_daisy_validator
	


Transformer documentation: int_daisy_validator




	Transformer Purpose
	Input Requirements
	Output
	
		On success
		On error
	
	
	Configuration/Customization
	
		Parameters (tdf)
		Extended configurability
	
	
	Further development
	Dependencies
	Author
	Licensing



Transformer Purpose

An input- and validation type agnostic validator, supporting multi-layered validation.
The main idea is to be able to feed the transformer "any" type of content; if 
there is a validator for the particular content type on the system, the content will be 
validated - using standardized API common to all implementations.

This agnosticism can be 'hidden' from users by having parallell scripts with more specific stated purposes.
If one wants to limit the validator to only accepting a certain input type, use the requireInputType parameter


The second main idea is to be able to apply multi-layered validation in one pass: 
for example, validate a Dtbook document against the canonical DTD and RelaxNG (from ZedVal), 
and then also to application- and organization specific subset rules. All using a schema type of your liking: RelaxNG, Schematron, XSD, 
or namedropped implementations of org.daisy.util.fileset.validation.delegate.ValidatorDelegate.

Benefits of this approach are:

neither users nor pipelines dont have to "know" what kind of content is currently being handled,
... which also allows content to be heterogenous in a particular pipeline
allows (as inherited from the factory pattern) implementations to be changed without modifying code.


The programmatic flow of this transformer in summary:

Try to create a org.daisy.util.fileset.Fileset instance on the input file (with DTD validation turned on).
If Fileset can represent this type of fileset, check if org.daisy.util.fileset.validator.ValidatorFactory can produce a Validator for the type of fileset.
If a fileset validator can be produced, run a Fileset validation. If ValidatorDelegates were suppiled as inparams, attach and execute these delegates.
If a fileset instance could not be created (because input type was not supported by the Fileset package), and if input is xml and has a DTD (prolog identifiers), run a standard DTD validation pass.
If additional schema (RNG, XSD, SCH) resources (one or several) were supplied as inparam, or if inline (non-DTD) schemas were present, attempt an anonymous jaxp.validation run using javax.xml.validation.SchemaFactory.
Inform the user on what kind of validation was actually done, and what the result was.     
		

Input Requirements

This transformer is input agnostic; will throw an exception if it cannot validate the given content type.

Output

This transformer can be configured to output the validation results to an xml file. 
The xml report follows a simple scheme and is divided into three parts: head, body and foot.

Head
The file starts with a head section containing the elements 
pipelineVersion and javaVersion. They hold information about
which versions of Daisy Pipeline and Java that are being used. The
Pipeline version seen there is the result of the call org.daisy.pipeline.Version.getVersion(),
the Java version is fetched from Java's system properties.

Body
The body section contains all messages reported from the transformer, both
validation errors and possible exceptions thrown. The message element - used for validation errors - has no
child text node, but instead four attributes containing the information:


file
a URI to the file in which the error/warning occured.

level
Indicates the error level, possible values are Severe error, Error and Warning.

msg
A message describing the error.

line
The line number of the end of the text where the error occurred.

col
The column number of the end of the text where the error occurred.


The exception element is used in a similar way to indicate that an exception 
was thrown during validation. No child text node, instead the following attributes:

level
Indicates the error level, always Severe error when it comes to exceptions.

msg
The exception message.

str
The exception stacktrace.



Those two elements may occur in any order inside the body element.

Foot
The foot section contains the element executionTime
which shows the validator execution time using a h:mm:ss.ms format.

Short Example
And putting the three parts together gives us the following example:

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="validator.xsl"?>
<validator xmlns="http://www.daisy.org/ns/pipeline/validator/">
    <head>
        <pipelineVersion>2006-10-17</pipelineVersion>
        <javaVersion>1.5.0_05</javaVersion>
    </head>
    
    <body>
        <message file="file://d:/foo.xml" line="-1" col="-1" msg="File not found: dinewfeat02.jpg" level="Severe error" />
        <message file="file://d:/foo.xml" msg="File not found: dicd01.jpg" col="-1" line="-1" level="Severe error" />
        <message file="file://d:/foo.xml" col="41" level="Error" msg="bad character content for element" line="49526" />
    </body>
    
    <foot>
        <executionTime>0:00:16.141</executionTime>
    </foot>
</validator>



See further Parameters.


On success



On error

On error, this transformer will send
a fatal message, then throw an exception and abort.

On inability to locate a validator for the input content type, 
this transformer will send a fatal message, then throw an exception and abort.

See further Parameters.





Configuration/Customization

Parameters (tdf)


	input
	Path to input file/fileset manifest to be validated
	
	requireInputType	
	
		A string describing one input type that is required, else an error will be generated.
		The strings are enumerated in the transformer TDF. They either consist of a Fileset nicename, or an XML document root element name.
		If the value is set to "off" no input type requirements will be enforced, ie a validation will be attempted regardless of input type.
	
		
	schemas
	
				Comma separated list of schema identifiers to validate the input document against. 
				Identifiers may be expressed as filepaths, public or system IDs.
				RelaxNG, W3C Schema, Schematron and Compound are allowed types of schemas.
				Schemas that occur inline in the validated document do not need to be namedropped here.
	
	
	
	delegates
	
		Comma separated list of delegates (implementations of org.daisy.util.fileset.validation.delegate.ValidatorDelegate)
	
	
	
	forceImplementation
	
	A fully qualified name of an implementation of a org.daisy.util.fileset.validation.Validator. 
	Use this parameter to force the validator to use the named implementation (overriding default assignment)
	
	
	generateContextInfo
	
	Sets whether to attempt generating additional information than that provided in a standard javax.xml.stream.Location.
	This a grammar specific process in some parts. New grammars are added by modifying org.daisy.util.xml.stax.ExtendedLocationTokens.xml.
	New types of information can be added by extending the InformationType enum in org.daisy.util.xml.stax.ExtendedLocationProvider.
	
	
	abortThreshold
	Validation error severity level - when to perform a Transformer abort
			
	abortOnException
	Whether to perform a Transformer abort when a caught exception occurs
	
	xmlReport
	The destination of the generated xml report/output.
	
	xmlStylesheet
	The value of the xml-stylesheet processing instruction in the generated xml output.


Extended configurability


Further development

The transformer is basically a wrapper around abstract factory and discovery patterns: given an arbitrary input content
type, the transformer will use factories to attempt to produce a validator that can validate the content.

The two main factory implementations used are:

org.daisy.util.fileset.validation.ValidatoryFactory, which is a content-centric producer of validators for various types of filesets (DTBs, wellknown document types, etc)
The org.daisy.util.xml.validation.jaxp.SchemaFactory package, which contains RelaxNG and Schematron extensions to the base XSD support that the javax.xml.validation package that the JRE provides.


These two factories can be extended to support more types of filesets, and more types of schemalanguages respectively.

Output of Report documents to given locations is another possible enhancement.

Dependencies


	zedval.jar


Author

Markus Gylling, Daisy Consortium

Licensing

LGPL