de.intarsys.pdf.cos.package.html Maven / Gradle / Ivy
This package implements the low level document and its datatypes according to
the pdf specification.
Every PDF document is finally composed using COS primitive and composite objects.
These objects are aggregated to a COSDocument.
The available primitives are
- Boolean: true or false
- Name: A special unique string like object, most often used as a key for dictionaries.
- Fixed: A fixed point real number.
- Integer: An integer number
- String: A string object
COS objects are composed using
- Array: A dynamic indexed collection implementation.
- Dictionary: A associative (Map) data structure.
Kind of hybrid structure is a
- Stream: A sequence of bytes, combined with a dictionary with additional
information about the stream itself.
This implementation uses a special object representing "null", COSNull. A lookup
in a composite will never return an ordinary Java null, always
COSNull, so you are never forced to check (obj == null || obj.isFoo()).
Another useful convention is the "marshalling" using the "as<COSType>"
flavor of methods. This methods will return either Java "null"
or an instance of the requested type. These conventions help around the
sometimes lazy implemented data structures of the PDF documents available out
there.
In a COS structure, other standalone objects and substructures can be referenced. This is
represented using COSIndirectObject. An indirect object is not returned by the
standard accessors or iterators from composite COS objects, you will always receive the
dereferenced COS object. To access the reference itself, you have to use the
"basic" flavor of methods.
Based on these purely technical objects without PDF domain specific behavior
the meaningful data types like "Rectangle", "Page" and so on are defined.
This is reflected in the framework provided by COSBasedObject, the super class
for all PDF domain objects. Only a few are defined directly in the COS level
itself, the most of them are found in the "pd" package.
While this seems not to be too complicated, this implementation is a
quite complex and powerful one, supporting things like
- update propagation
- lazy reading
- swapping
- state management (for example for simple undo)
- preserve COS invariants that in most implementations lead to failures in the
resulting documents that are hard to debug, like constraints on containement,
identity etc.