
com.sun.xml.tree.package.html Maven / Gradle / Ivy
This package supports in-memory XML documents in the form of a parse tree
compliant with the W3C DOM Level 1 Core Recommendation, with extensions
including support for XML Namespaces as defined by the current XML
proposed recommendation. (Only the DOM Core APIs are used for XML; there
are additional HTML-specific features, which are optional.) The
package is extended with support for printing XML and for customizing
DOM Documents used as parse trees with DOM element subclasses.
The normal
navigational metaphor for these documents is that of a tree, with
array-like accessors are available for child nodes. Documents are
factories for the nodes which may be stored within them, for use by
programs which construct documents node by node rather than parsing
them.
DOM methods are not defined as being suited for multithreaded
use without external application-specific synchronization policies.
For example, if an application treats a document as readonly, then
no synchronization problems will exist; or multiple threads could
synchronize on each node's ownerDocument while accessing
or modifying a given document.
Note that not all implementation classes are exposed here.
You must use the DOM methods (typically a factory method on an
XmlDocument instance) to create such nodes, such as
Text and Comment nodes. Only node types which need to be public
for purposes of subclassing or access to extended
functionality are currently exposed.
- Reading an XML Document
- Writing XML Documents and Nodes
- Navigating a Document
- Constructing a Document Programmatically
- Custom Element Classes in a Document
- XML Namespaces
Note that this package supports various extensions to the DOM
Level 1 core specification, as required for "real" applications.
Look at the specifications of the interfaces in this package to get
the best overview of that functionality.
Reading an XML Document
The XmlDocument class may be thought of as the root of a tree
of XML data. It's easy to get one of these either by parsing an XML
document, or by instantiating an empty one directly. The document
has a single ElementNode, optionally preceded and followed
by CommentNode and PINode values. Documents may
also have a Document Type Definition (DTD), and may optionally be
validated as they are parsed.
XmlDocument document;
Element rootElement;
document = XmlDocument.createXmlDocument (
"http://www.w3.org/TR/1998/REC-xml-19980210.xml",
false);
rootElement = (ElementNode) element.getDocumentElement ();
The most flexible way to create an XML document involves direct use of
the XmlDocumentBuilder class with a SAX parser. It is a SAX document
handler, which constructs documents from parser callbacks.
Writing XML Documents and Nodes
To save a document or node, get a Writer, preferably one using
an efficient loss-free encoding such as UTF-8. Then just use the
write(Writer) method to save that document; all the node types
in this package support such methods, so you can write each node and any
of its children with a single method call.
The XmlDocument class has two additional methods. If you
write using an OutputStream it is automatically encoded using
the UTF-8 encoding. Or you may describe the character encoding being
used with your Writer, to ensure that the XML declaration written out
is described as using that encoding.
If you want to write a document or node using some output format
other than XML, you can override its write(Writer) method.
The implementation of such methods involves calls to writeXml
methods. You can customize your tree to include only nodes that know
how to write themselves as HTML, or some other output format.
XML text is normally pretty-printed. This facilitates human
use of the text, such as diagnosing problems that could be masked
by documents consisting of a single line of text. To avoid such
pretty printing, use writeXml with a write context set up
to not use prettyprinting.
Navigating a Document
A number of classes are used to represent nodes in a document. These
conform with current DOM APIs, in some cases providing additional
methods. Many applications will use only the XmlDocument,
ElementNode, and TextNode classes. The class which
represents an XML "Processing Instruction" (PINode) is also
used by some XML applications to control their processing.
All nodes support the notion of siblings and parents. In addition,
element (and document, and the editor-oriented document fragment)
nodes also support children. You can access
children using an array-like model, or by getFirstChild and
then traversing its siblings using getNextSibling. Of
course, the array-like model is not stable if you're editing the tree,
because the indices are subject to change. However it is efficient,
and is very convenient to use when that's not an issue.
Constructing a Document Programmatically
Once you create an XML Document, you use it as a factory to create
the nodes a parser would, such as DOM ProcessingInstruction,
Text, Comment, CDATASection, and
of course Element. As described below, you may configure
the XmlDocument class (or potentially a subclass) to return
element nodes which add application-specific behaviors.
After you create nodes, you will normally use the DOM
Element.appendChild method to append the node to some element.
Other primitives also exist, and you may delete nodes from the tree,
or insert them before other nodes.
If you wish the document to be written out in a form that is
relatively readable by humans, you may wish to insert text nodes
with whitespace to perform simple formatting. For example, a
Text node with a newline, following each element.
Custom Element Classes in Document
You can configure XmlDocumentBuilder (and also
XmlDocument instances) with factories returning element
classes that are specialized for a given element type. This lets
you easily transform between externalized XML data formats and
in-memory data structures which:
- Implement behaviors specific to the application task
being performed with the data;
- Enforce semantic constraints;
- Match applications' object model requirements better
than the tree model of DOM.
For example, the classes could support the HTML DOM
methods, or provide methods used to drive an XSL implementation
(using the namespace-aware factory infrastructure).
Your classes could implement interfaces used to integrate
with frameworks for your server side web application; or implement
a model to be viewed with Swing; or they could automatically convert from
older external formats to the most up-to-date internal one. They could
also be used to bind XML nodes to existing components, including "legacy"
business data and objects in an existing application kernel. Such objects
might require use of the Java Native Interface (JNI) to call them from Java.
Subclassing ElementNode
Since DOM does not support mixing classes from different implementations,
such implementations must be associated with a particular implementation
of the DOM core classes. For this implementation, that means that
only ElementNode is permitted as
a base class for custom element nodes. (You must provide a publicly
accessible default constructor.)
Customized Element classes can intercept parsing events repored
through the XmlReader interface.
For example, a node might normalize whitespace, or might
convert some attributes or elements to object properties. In general, such
nodes can transform a data model exposed in XML to one that better
matches application's modeling requirements, and vice versa.
Customized Element classes may need to change how they handle the
writeXml method, perhaps writing out their XML start and
end tags specially or controlling how ElementNode.writeChildrenXml
presents child nodes.
Element Factories
Element factories create new elements based on element tag names,
optionally considering the XML Namespace associated with the element.
There are two basic ways to use element factories:
- Provide a custom factory. If you do this, you may
not need to make your custom element classes public. The simplest
custom factories have nested "if" statements, first checking the
namespace URI and then instantiating a class based on the element
name within that namespace. More complex ones could be driven by
configuration data and their environment.
- Configure a standard factory. In many cases this
is the preferred approach. The
SimpleElementFactory class can be configured through tables,
and supports namespace-aware mappings.
In the future, a declarative syntax for configuring the
standard factory could be suppported. Such a syntax would be embeddable
in XML documents, so that documents themselves may optionally be the
source of such bindings.
Since the mappings between XML element types and classes are not
necessarily part of the document, you can use different mappings in
different environments or when different roles are required. The behaviors
of a message sender, for example, will usually differ from those of the
recipient. Clients often need to support interactions based on graphical
user interfaces, which aren't appropriate on servers. Such differences
can be controlled by using different mappings in different environments.
Subclassing XmlDocument
You may also wish to subclass XmlDocument
in order to provide specialized behaviors. Such behaviors could
include using some particular factory configuration by default
(e.g. to support all of the HTML DOM interfaces)
interpreting particular processing instructions (albeit without access
to any current element), and more.
If you do this, it will be important to also define a
subclass of XmlDocumentBuilder
which returns an instance of this class when it parses documents.
To do so, override the createDocument method.
XML Namespaces
By default, an XmlDocumentBuilder supports the
November 1998 version of XML namespaces during parsing. Element and
attribute names may be explicitly or implicitly qualified according to
the naming context (here called "namespace") in which they are bound.
A natural model is lexical scoping, as declared in a document's DTD.
Responsibility for enforcing namespace constraints is entirely
in this builder. Accordingly, you should use a Sun XML parser, which
reports additional DTD events required to enforce the additional
requirements of the XML namespaces spec.
Use the setParser method to establish the bidirectional
linkage between the parser and builder.
You can disable namespace error checking during parsing if you wish,
through the disableNamespaces property on the builder. You
only need to do this if you are working with documents which use colons
in their names ("reserved for namespace experiments") but do not conform to
the syntax defined in the XML namespace draft.