
doc.transformers.int_daisy_validator.html Maven / Gradle / Ivy
int_daisy_validator
Transformer documentation: int_daisy_validator
Transformer Purpose
An input- and validation type agnostic validator, supporting multi-layered validation.
The main idea is to be able to feed the transformer "any" type of content; if
there is a validator for the particular content type on the system, the content will be
validated - using standardized API common to all implementations.
This agnosticism can be 'hidden' from users by having parallell scripts with more specific stated purposes.
If one wants to limit the validator to only accepting a certain input type, use the requireInputType parameter
The second main idea is to be able to apply multi-layered validation in one pass:
for example, validate a Dtbook document against the canonical DTD and RelaxNG (from ZedVal),
and then also to application- and organization specific subset rules. All using a schema type of your liking: RelaxNG, Schematron, XSD,
or namedropped implementations of org.daisy.util.fileset.validation.delegate.ValidatorDelegate.
Benefits of this approach are:
- neither users nor pipelines dont have to "know" what kind of content is currently being handled,
- ... which also allows content to be heterogenous in a particular pipeline
- allows (as inherited from the factory pattern) implementations to be changed without modifying code.
The programmatic flow of this transformer in summary:
- Try to create a org.daisy.util.fileset.Fileset instance on the input file (with DTD validation turned on).
- If Fileset can represent this type of fileset, check if org.daisy.util.fileset.validator.ValidatorFactory can produce a Validator for the type of fileset.
- If a fileset validator can be produced, run a Fileset validation. If ValidatorDelegates were suppiled as inparams, attach and execute these delegates.
- If a fileset instance could not be created (because input type was not supported by the Fileset package), and if input is xml and has a DTD (prolog identifiers), run a standard DTD validation pass.
- If additional schema (RNG, XSD, SCH) resources (one or several) were supplied as inparam, or if inline (non-DTD) schemas were present, attempt an anonymous jaxp.validation run using javax.xml.validation.SchemaFactory.
- Inform the user on what kind of validation was actually done, and what the result was.
Input Requirements
This transformer is input agnostic; will throw an exception if it cannot validate the given content type.
Output
This transformer can be configured to output the validation results to an xml file.
The xml report follows a simple scheme and is divided into three parts: head, body and foot.
Head
The file starts with a head section containing the elements
pipelineVersion and javaVersion. They hold information about
which versions of Daisy Pipeline and Java that are being used. The
Pipeline version seen there is the result of the call org.daisy.pipeline.Version.getVersion(),
the Java version is fetched from Java's system properties.
Body
The body section contains all messages reported from the transformer, both
validation errors and possible exceptions thrown. The message element - used for validation errors - has no
child text node, but instead four attributes containing the information:
- file
- a URI to the file in which the error/warning occured.
- level
- Indicates the error level, possible values are Severe error, Error and Warning.
- msg
- A message describing the error.
- line
- The line number of the end of the text where the error occurred.
- col
- The column number of the end of the text where the error occurred.
The exception element is used in a similar way to indicate that an exception
was thrown during validation. No child text node, instead the following attributes:
- level
- Indicates the error level, always Severe error when it comes to exceptions.
- msg
- The exception message.
- str
- The exception stacktrace.
Those two elements may occur in any order inside the body element.
Foot
The foot section contains the element executionTime
which shows the validator execution time using a h:mm:ss.ms format.
Short Example
And putting the three parts together gives us the following example:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type="text/xsl" href="validator.xsl"?>
<validator xmlns="http://www.daisy.org/ns/pipeline/validator/">
<head>
<pipelineVersion>2006-10-17</pipelineVersion>
<javaVersion>1.5.0_05</javaVersion>
</head>
<body>
<message file="file://d:/foo.xml" line="-1" col="-1" msg="File not found: dinewfeat02.jpg" level="Severe error" />
<message file="file://d:/foo.xml" msg="File not found: dicd01.jpg" col="-1" line="-1" level="Severe error" />
<message file="file://d:/foo.xml" col="41" level="Error" msg="bad character content for element" line="49526" />
</body>
<foot>
<executionTime>0:00:16.141</executionTime>
</foot>
</validator>
See further Parameters.
On success
On error
On error, this transformer will send
a fatal message, then throw an exception and abort.
On inability to locate a validator for the input content type,
this transformer will send a fatal message, then throw an exception and abort.
See further Parameters.
Configuration/Customization
Parameters (tdf)
- input
- Path to input file/fileset manifest to be validated
- requireInputType
-
A string describing one input type that is required, else an error will be generated.
The strings are enumerated in the transformer TDF. They either consist of a Fileset nicename, or an XML document root element name.
If the value is set to "off" no input type requirements will be enforced, ie a validation will be attempted regardless of input type.
- schemas
-
Comma separated list of schema identifiers to validate the input document against.
Identifiers may be expressed as filepaths, public or system IDs.
RelaxNG, W3C Schema, Schematron and Compound are allowed types of schemas.
Schemas that occur inline in the validated document do not need to be namedropped here.
- delegates
-
Comma separated list of delegates (implementations of org.daisy.util.fileset.validation.delegate.ValidatorDelegate)
- forceImplementation
-
A fully qualified name of an implementation of a org.daisy.util.fileset.validation.Validator.
Use this parameter to force the validator to use the named implementation (overriding default assignment)
- generateContextInfo
-
Sets whether to attempt generating additional information than that provided in a standard javax.xml.stream.Location.
This a grammar specific process in some parts. New grammars are added by modifying org.daisy.util.xml.stax.ExtendedLocationTokens.xml.
New types of information can be added by extending the InformationType enum in org.daisy.util.xml.stax.ExtendedLocationProvider.
- abortThreshold
- Validation error severity level - when to perform a Transformer abort
- abortOnException
- Whether to perform a Transformer abort when a caught exception occurs
- xmlReport
- The destination of the generated xml report/output.
- xmlStylesheet
- The value of the xml-stylesheet processing instruction in the generated xml output.
Extended configurability
Further development
The transformer is basically a wrapper around abstract factory and discovery patterns: given an arbitrary input content
type, the transformer will use factories to attempt to produce a validator that can validate the content.
The two main factory implementations used are:
org.daisy.util.fileset.validation.ValidatoryFactory
, which is a content-centric producer of validators for various types of filesets (DTBs, wellknown document types, etc)
- The
org.daisy.util.xml.validation.jaxp.SchemaFactory
package, which contains RelaxNG and Schematron extensions to the base XSD support that the javax.xml.validation
package that the JRE provides.
These two factories can be extended to support more types of filesets, and more types of schemalanguages respectively.
Output of Report documents to given locations is another possible enhancement.
Dependencies
- zedval.jar
Author
Markus Gylling, Daisy Consortium
Licensing
LGPL