All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.scripts.DTBookFix.html Maven / Gradle / Ivy

The newest version!




  
  Pipeline Script: DTBook Fix
  



Pipeline Script: DTBook Fix

Overview

This script will attempt to repair and tidy a suboptimal DTBook document. The script is primarily intended to address structural problems that occur in files that are output from automated conversion processes, such as the WordML to DTBook Script.

The actual manipulation routines performed are described in the DTBookFix Categories section below.

Input Requirements

A DTBook document. As is seldom the case in the Pipeline, the document need not be valid.

Note that the manipulations that are performed are heavily dependent on which version of DTBook is used (2005-1, 2005-2, etc). If your input document is of a version that is not supported by all or some of the manipulation routines that are performed, these routines will be disabled, and warnings will be issued.

Output

Depending on the input document version and your settings, the output document will have had anything from zero to several different structural modifications. Note that no guarantees are made that the output will be valid.

The input document is validated at the end of the process, so watch the validation messages that are issued towards the end of the process.

If you keep encountering DTBook documents with recurring problems that are not fixed by this Script, please contact the Pipeline development team.

Configuration

Input file
Required. Select input DTBook file
Output directory
Required. Select where to store the output result.
Active Categories
Select the type of manipulation to be performed by activating one or several Categories.
Read more on what is included in each category in the DTBookFix Categories section below.
Force Execution
Optional. When checked, DTBookFix will run all selected categories disregarding the input documents state (by default, the Repair category is run only if the document is invalid, and the Tidy category is run only of the document is valid).
Simplify heading layout
This is an optional routine within the Tidy category. Check the box to simplify the level structure by removing redundant levels.
See further Level cleanup below.
Tidy inline whitespace
This is an optional in Tidy category. Check box to move leading and trailing whitespace outside of em, strong, sub, sup, noteref and pagenum elements.
See further Tidy inline whitespace below.
Fix Character set
This is an optional routine within the Repair category. Check box to attempt to fix an invalid stated character set. See further Character Set recoder below.

DTBookFix Categories

This section gives a technical summary of the manipulations that are done within each DTBookFix manipulation category.

The Repair Category

Level splitter
Splits a level into several levels if a certain level1-6 element has several headings on the same level.
Level 1-6 repair
Inserts level1-6 elements where needed to meet the requirements on proper nesting
Illegal heading removal
Changes an illegal heading (for example, an h3 element inside a level2 element) into a p element. The p element will have the class attribute value of the original heading element name (e.g. <p class="h3">).
Flatten redundant nesting
Removes nested p
Complete structure
Adds an empty paragraph if the last element in the level is a heading
List repair
  • Wraps a list in li when the parent of the list is another list.
  • adds a type attribute if missing (default value is "pl")
  • Corrects the depth attribute if it is incorrect
  • Removes the enum attribute if the list is not ordered
  • Removes the start attribute if the list is not ordered
Character Set recoder
This will run a character set detection algorithm on the input file, disregarding any stated characterset, and then recode the entire file. Only enable this if you have explicit issues with character display, or when your document is reported to be malformed with stated reasons similar to "invalid byte sequence" or similar.
IDREF repair
  • Adds the idref attribute to noteref and annoref elements if missing
  • Estimate and give the idref attribute a value if empty
  • Add a hash mark in the beginning of all idref attributes that don't contain a hash mark.
Empty elements remover
Removes empty/whitespace elements that must have children.
Page number type repair

Changes the type attribute of the pagenum element to match the contents of the element (i.e. the page number value).

Incorrect "normal" page numbers will be changed to "front" if the contents contains roman numerals and the element is located in the frontmatter of the book. Incorrect "front" page numbers will be changed to "normal" if the contents contains arabic numbers. Otherwise the page attribute will be changed to "special" if it is incorrect.

Metadata repair
  • Fixes Dublin Core metadata name case errors (i.e. dc:title is changed to dc:Title)
  • Removes unknown Dublin Core metadata (i.e. dc:Hello)
  • Adds a dtb:uid from dc:Identifier, if missing
  • Adds a dc:Title from the first doctitle element of the book, if missing

The Tidy Category

Level cleanup
Simplifies the level structure by removing redundant levels (subordinate levels will be moved upwards). Note that the headings of the affected levels will also change, which will alter the appearance of the layout.
Pagenum mover
  • Pagenums in headings are placed before the heading
  • Pagenums in words are placed after the word.
Change inline pagenum to block
Removes otherwise empty p or li around pagenum (except p in td)
Empty elements remover
Removes empty/whitespace elements (p, em, strong, sub, sup), unless required for validity. E.g. an empty p that is preceded by a heading and followed only by other empty p is not removed.
Author and Title addition
Inserts docauthor and doctitle elements to frontmatter using Dublin Core metadata.
Tidy inline whitespace
Moves leading and trailing whitespace outside of em, strong, sub, sup and pagenum elements. For example: "this is an<em> example </em>of what<strong> Tidy inline whitespace </strong>does" will change to: "this is an <em>example</em> of what <strong>Tidy inline whitespace</strong> does". This is a requirement for accurate braille rendering.
Indenter
Performs a "pretty print" of the XML elements in the document.

Appendix: List of Transformers used

The documents linked below are parts of the Transformer technical documentation. These are developer and systems-administrator centric documents.

  1. se_tpb_dtbookFix




© 2015 - 2025 Weber Informatics LLC | Privacy Policy