loci.poi.hpsf.package.html Maven / Gradle / Ivy
Show all versions of ome-poi Show documentation
HPSF
Processes streams in the Horrible Property Set Format (HPSF) in POI
filesystems. Microsoft Office documents, i.e. POI filesystems, usually
contain meta data like author, title, last saving time etc. These items
are called properties and stored in
property set streams along with the document itself. These
streams are commonly named \005SummaryInformation and
\005DocumentSummaryInformation. However, a POI filesystem may
contain further property sets of other names or types.
In order to extract the properties from a POI filesystem, a property set
stream's contents must be parsed into a {@link
loci.poi.hpsf.PropertySet} instance. Its subclasses {@link
loci.poi.hpsf.SummaryInformation} and {@link
loci.poi.hpsf.DocumentSummaryInformation} deal with the well-known
property set streams \005SummaryInformation and
\005DocumentSummaryInformation. (However, the streams' names are
irrelevant. What counts is the property set's first section's format ID -
see below.)
The factory method {@link loci.poi.hpsf.PropertySetFactory#create}
creates a {@link loci.poi.hpsf.PropertySet} instance. This method
always returns the most specific property set: If it
identifies the stream data as a Summary Information or as a Document
Summary Information it returns an instance of the corresponding class, else
the general {@link loci.poi.hpsf.PropertySet}.
A {@link loci.poi.hpsf.PropertySet} contains a list of {@link
loci.poi.hpsf.Section}s which can be retrieved with {@link
loci.poi.hpsf.PropertySet#getSections}. Each {@link
loci.poi.hpsf.Section} contains a {@link
loci.poi.hpsf.Property} array which can be retrieved with {@link
loci.poi.hpsf.Section#getProperties}. Since the vast majority of
{@link loci.poi.hpsf.PropertySet}s contains only a single {@link
loci.poi.hpsf.Section}, the convenience method {@link
loci.poi.hpsf.PropertySet#getProperties} returns the properties of a
{@link loci.poi.hpsf.PropertySet}'s {@link
loci.poi.hpsf.Section} (throwing a {@link
loci.poi.hpsf.NoSingleSectionException} if the {@link
loci.poi.hpsf.PropertySet} contains more (or less) than exactly one
{@link loci.poi.hpsf.Section}).
Each {@link loci.poi.hpsf.Property} has an ID, a
type, and a value which can be retrieved
with {@link loci.poi.hpsf.Property#getID}, {@link
loci.poi.hpsf.Property#getType}, and {@link
loci.poi.hpsf.Property#getValue}, respectively. The value's class
depends on the property's type. The current implementation
does not yet support all property types and restricts the values' classes
to {@link java.lang.String}, {@link java.lang.Integer} and {@link
java.util.Date}. A value of a yet unknown type is returned as a byte array
containing the value's origin bytes from the property set stream.
To retrieve the value of a specific {@link loci.poi.hpsf.Property},
use {@link loci.poi.hpsf.Section#getProperty} or {@link
loci.poi.hpsf.Section#getPropertyIntValue}.
The {@link loci.poi.hpsf.SummaryInformation} and {@link
loci.poi.hpsf.DocumentSummaryInformation} classes provide convenience
methods for retrieving well-known properties. For example, an application
that wants to retrieve a document's title string just calls {@link
loci.poi.hpsf.SummaryInformation#getTitle} instead of going through
the hassle of first finding out what the title's property ID is and then
using this ID to get the property's value.
Writing properties can be done with the classes
{@link loci.poi.hpsf.MutablePropertySet}, {@link
loci.poi.hpsf.MutableSection}, and {@link
loci.poi.hpsf.MutableProperty}.
Public documentation from Microsoft can be found in the appropriate section of the MSDN Library.
History
- 2003-09-11:
-
{@link loci.poi.hpsf.PropertySetFactory#create(InputStream)} no
longer throws an
{@link loci.poi.hpsf.UnexpectedPropertySetTypeException}.
To Do
The following is still left to be implemented. Sponsering could foster
these issues considerably.
-
Convenience methods for setting summary information and document
summary information properties
-
Better codepage support
-
Support for more property (variant) types
@author Rainer Klute ([email protected])
@version $Id: package.html 496526 2007-01-15 22:46:35Z markt $
@since 2002-02-09