All Downloads are FREE. Search and download functionalities are using the official Maven repository.

de.intarsys.pdf.cos.package.html Maven / Gradle / Ivy

Go to download

This is a fork of http://sourceforge.net/projects/jpodlib/ as development seems to be frozen. We're providing some bug fixes along with deployments to maven.

There is a newer version: 2.0
Show newest version


This package implements the low level document and its datatypes according to
the pdf specification.

Every PDF document is finally composed using COS primitive and composite objects. These objects are aggregated to a COSDocument. The available primitives are

  • Boolean: true or false
  • Name: A special unique string like object, most often used as a key for dictionaries.
  • Fixed: A fixed point real number.
  • Integer: An integer number
  • String: A string object
COS objects are composed using
  • Array: A dynamic indexed collection implementation.
  • Dictionary: A associative (Map) data structure.
Kind of hybrid structure is a
  • Stream: A sequence of bytes, combined with a dictionary with additional information about the stream itself.

This implementation uses a special object representing "null", COSNull. A lookup in a composite will never return an ordinary Java null, always COSNull, so you are never forced to check (obj == null || obj.isFoo()). Another useful convention is the "marshalling" using the "as<COSType>" flavor of methods. This methods will return either Java "null" or an instance of the requested type. These conventions help around the sometimes lazy implemented data structures of the PDF documents available out there.

In a COS structure, other standalone objects and substructures can be referenced. This is represented using COSIndirectObject. An indirect object is not returned by the standard accessors or iterators from composite COS objects, you will always receive the dereferenced COS object. To access the reference itself, you have to use the "basic" flavor of methods.

Based on these purely technical objects without PDF domain specific behavior the meaningful data types like "Rectangle", "Page" and so on are defined. This is reflected in the framework provided by COSBasedObject, the super class for all PDF domain objects. Only a few are defined directly in the COS level itself, the most of them are found in the "pd" package.

While this seems not to be too complicated, this implementation is a quite complex and powerful one, supporting things like

  • update propagation
  • lazy reading
  • swapping
  • state management (for example for simple undo)
  • preserve COS invariants that in most implementations lead to failures in the resulting documents that are hard to debug, like constraints on containement, identity etc.





© 2015 - 2024 Weber Informatics LLC | Privacy Policy