org.htmlunit.xpath.xml.dtm.DTM Maven / Gradle / Ivy
Show all versions of htmlunit-xpath Show documentation
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.htmlunit.xpath.xml.dtm;
import org.htmlunit.xpath.objects.XString;
/**
* DTM
is an XML document model expressed as a table rather than an object tree. It
* attempts to provide an interface to a parse tree that has very little object creation. (DTM
* implementations may also support incremental construction of the model, but that's hidden from
* the DTM API.)
*
* Nodes in the DTM are identified by integer "handles". A handle must be unique within a
* process, and carries both node identification and document identification. It must be possible to
* compare two handles (and thus their nodes) for identity with "==".
*
*
Namespace URLs, local-names, and expanded-names can all be represented by and tested as
* integer ID values. An expanded name represents (and may or may not directly contain) a
* combination of the URL ID, and the local-name ID. Note that the namespace URL id can be 0, which
* should have the meaning that the namespace is null. For consistancy, zero should not be used for
* a local-name index.
*
*
Text content of a node is represented by an index and length, permitting efficient storage
* such as a shared FastStringBuffer.
*
*
The model of the tree, as well as the general navigation model, is that of XPath 1.0, for the
* moment. The model will eventually be adapted to match the XPath 2.0 data model, XML Schema, and
* InfoSet.
*
*
DTM does _not_ directly support the W3C's Document Object Model. However, it attempts to come
* close enough that an implementation of DTM can be created that wraps a DOM and vice versa.
*
*
Please Note: The DTM API is still Subject To Change. This
* wouldn't affect most users, but might require updating some extensions.
*
*
The largest change being contemplated is a reconsideration of the Node Handle representation.
* We are still not entirely sure that an integer packed with two numeric subfields is really the
* best solution. It has been suggested that we move up to a Long, to permit more nodes per document
* without having to reduce the number of slots in the DTMManager. There's even been a proposal that
* we replace these integers with "cursor" objects containing the internal node id and a pointer to
* the actual DTM object; this might reduce the need to continuously consult the DTMManager to
* retrieve the latter, and might provide a useful "hook" back into normal Java heap management. But
* changing this datatype would have huge impact on Xalan's internals -- especially given Java's
* lack of C-style typedefs -- so we won't cut over unless we're convinced the new solution really
* would be an improvement!
*/
public interface DTM {
/** Null node handles are represented by this value. */
int NULL = -1;
// These nodeType mnemonics and values are deliberately the same as those
// used by the DOM, for convenient mapping
//
// %REVIEW% Should we actually define these as initialized to,
// eg. org.w3c.dom.Document.ELEMENT_NODE?
/** The node is an Element
. */
short ELEMENT_NODE = 1;
/** The node is an Attr
. */
short ATTRIBUTE_NODE = 2;
/** The node is a Text
node. */
short TEXT_NODE = 3;
/** The node is a CDATASection
. */
short CDATA_SECTION_NODE = 4;
/** The node is an EntityReference
. */
short ENTITY_REFERENCE_NODE = 5;
/** The node is an Entity
. */
short ENTITY_NODE = 6;
/** The node is a ProcessingInstruction
. */
short PROCESSING_INSTRUCTION_NODE = 7;
/** The node is a Comment
. */
short COMMENT_NODE = 8;
/** The node is a Document
. */
short DOCUMENT_NODE = 9;
/** The node is a DocumentType
. */
short DOCUMENT_TYPE_NODE = 10;
/** The node is a DocumentFragment
. */
short DOCUMENT_FRAGMENT_NODE = 11;
/** The node is a Notation
. */
short NOTATION_NODE = 12;
/**
* The node is a namespace node
. Note that this is not currently a node type defined
* by the DOM API.
*/
short NAMESPACE_NODE = 13;
/** The number of valid nodetypes. */
short NTYPES = 14;
// ========= Document Navigation Functions =========
/**
* This returns a stateless "traverser", that can navigate over an XPath axis, though not in
* document order.
*
* @param axis One of Axes.ANCESTORORSELF, etc.
* @return A DTMAxisIterator, or null if the givin axis isn't supported.
*/
DTMAxisTraverser getAxisTraverser(final int axis);
/**
* This is a shortcut to the iterators that implement XPath axes. Returns a bare-bones iterator
* that must be initialized with a start node (using iterator.setStartNode()).
*
* @param axis One of Axes.ANCESTORORSELF, etc.
* @return A DTMAxisIterator, or null if the givin axis isn't supported.
*/
DTMAxisIterator getAxisIterator(final int axis);
/**
* Given a node handle, get the handle of the node's first child.
*
* @param nodeHandle int Handle of the node.
* @return int DTM node-number of first child, or DTM.NULL to indicate none exists.
*/
int getFirstChild(int nodeHandle);
/**
* Given a node handle, get the handle of the node's last child.
*
* @param nodeHandle int Handle of the node.
* @return int Node-number of last child, or DTM.NULL to indicate none exists.
*/
int getLastChild(int nodeHandle);
/**
* Retrieves an attribute node by local name and namespace URI
*
*
%TBD% Note that we currently have no way to support the DOM's old getAttribute() call, which
* accesses only the qname.
*
* @param elementHandle Handle of the node upon which to look up this attribute.
* @param namespaceURI The namespace URI of the attribute to retrieve, or null.
* @param name The local name of the attribute to retrieve.
* @return The attribute node handle with the specified name ( nodeName
) or
* DTM.NULL
if there is no such attribute.
*/
int getAttributeNode(int elementHandle, String namespaceURI, String name);
/**
* Given a node handle, get the index of the node's first attribute.
*
* @param nodeHandle int Handle of the node.
* @return Handle of first attribute, or DTM.NULL to indicate none exists.
*/
int getFirstAttribute(int nodeHandle);
/**
* Given a node handle, get the index of the node's first namespace node.
*
* @param nodeHandle handle to node, which should probably be an element node, but need not be.
* @param inScope true if all namespaces in scope should be returned, false if only the node's own
* namespace declarations should be returned.
* @return handle of first namespace, or DTM.NULL to indicate none exists.
*/
int getFirstNamespaceNode(int nodeHandle, boolean inScope);
/**
* Given a node handle, advance to its next sibling.
*
* @param nodeHandle int Handle of the node.
* @return int Node-number of next sibling, or DTM.NULL to indicate none exists.
*/
int getNextSibling(int nodeHandle);
/**
* Given a node handle, find its preceeding sibling. WARNING: DTM implementations may be
* asymmetric; in some, this operation has been resolved by search, and is relatively expensive.
*
* @param nodeHandle the id of the node.
* @return int Node-number of the previous sib, or DTM.NULL to indicate none exists.
*/
int getPreviousSibling(int nodeHandle);
/**
* Given a node handle, advance to the next attribute. If an element, we advance to its first
* attribute; if an attr, we advance to the next attr of the same element.
*
* @param nodeHandle int Handle of the node.
* @return int DTM node-number of the resolved attr, or DTM.NULL to indicate none exists.
*/
int getNextAttribute(int nodeHandle);
/**
* Given a namespace handle, advance to the next namespace in the same scope (local or
* local-plus-inherited, as selected by getFirstNamespaceNode)
*
* @param baseHandle handle to original node from where the first child was relative to (needed to
* return nodes in document order).
* @param namespaceHandle handle to node which must be of type NAMESPACE_NODE. NEEDSDOC @param
* inScope
* @return handle of next namespace, or DTM.NULL to indicate none exists.
*/
int getNextNamespaceNode(int baseHandle, int namespaceHandle, boolean inScope);
/**
* Given a node handle, find its parent node.
*
* @param nodeHandle the id of the node.
* @return int Node handle of parent, or DTM.NULL to indicate none exists.
*/
int getParent(int nodeHandle);
/**
* Given a DTM which contains only a single document, find the Node Handle of the Document node.
* Note that if the DTM is configured so it can contain multiple documents, this call will return
* the Document currently under construction -- but may return null if it's between documents.
* Generally, you should use getOwnerDocument(nodeHandle) or getDocumentRoot(nodeHandle) instead.
*
* @return int Node handle of document, or DTM.NULL if a shared DTM can not tell us which Document
* is currently active.
*/
int getDocument();
/**
* Given a node handle, find the owning document node. This version mimics the behavior of the DOM
* call by the same name.
*
* @param nodeHandle the id of the node.
* @return int Node handle of owning document, or DTM.NULL if the node was a Document.
* @see #getDocumentRoot(int nodeHandle)
*/
int getOwnerDocument(int nodeHandle);
/**
* Given a node handle, find the owning document node.
*
* @param nodeHandle the id of the node.
* @return int Node handle of owning document, or the node itself if it was a Document. (Note
* difference from DOM, where getOwnerDocument returns null for the Document node.)
* @see #getOwnerDocument(int nodeHandle)
*/
int getDocumentRoot(int nodeHandle);
/**
* Get the string-value of a node as a String object (see ... for the definition of a node's
* string-value).
*
* @param nodeHandle The node ID.
* @return A string object that represents the string-value of the given node.
*/
XString getStringValue(int nodeHandle);
/**
* Given a node handle, return an ID that represents the node's expanded name.
*
* @param nodeHandle The handle to the node in question.
* @return the expanded-name id of the node.
*/
int getExpandedTypeID(int nodeHandle);
/**
* Given an expanded name, return an ID. If the expanded-name does not exist in the internal
* tables, the entry will be created, and the ID will be returned. Any additional nodes that are
* created that have this expanded name will use this ID.
*
*
NEEDSDOC @param namespace NEEDSDOC @param localName NEEDSDOC @param type
*
* @return the expanded-name id of the node.
*/
int getExpandedTypeID(String namespace, String localName, int type);
/**
* Given a node handle, return its DOM-style node name. This will include names such as #text or
* #document.
*
* @param nodeHandle the id of the node.
* @return String Name of this node, which may be an empty string. %REVIEW% Document when empty
* string is possible...
*/
String getNodeName(int nodeHandle);
/**
* Given a node handle, return the XPath node name. This should be the name as described by the
* XPath data model, NOT the DOM-style name.
*
* @param nodeHandle the id of the node.
* @return String Name of this node.
*/
String getNodeNameX(int nodeHandle);
/**
* Given a node handle, return its DOM-style localname. (As defined in Namespaces, this is the
* portion of the name after the prefix, if present, or the whole node name if no prefix exists)
*
* @param nodeHandle the id of the node.
* @return String Local name of this node.
*/
String getLocalName(int nodeHandle);
/**
* Given a namespace handle, return the prefix that the namespace decl is mapping. Given a node
* handle, return the prefix used to map to the namespace. (As defined in Namespaces, this is the
* portion of the name before any colon character).
*
*
%REVIEW% Are you sure you want "" for no prefix?
*
* @param nodeHandle the id of the node.
* @return String prefix of this node's name, or "" if no explicit namespace prefix was given.
*/
String getPrefix(int nodeHandle);
/**
* Given a node handle, return its DOM-style namespace URI (As defined in Namespaces, this is the
* declared URI which this node's prefix -- or default in lieu thereof -- was mapped to.)
*
* @param nodeHandle the id of the node.
* @return String URI value of this node's namespace, or null if no namespace was resolved.
*/
String getNamespaceURI(int nodeHandle);
/**
* Given a node handle, return its node value. This is mostly as defined by the DOM, but may
* ignore some conveniences.
*
*
*
* @param nodeHandle The node id.
* @return String Value of this node, or null if not meaningful for this node type.
*/
String getNodeValue(int nodeHandle);
/**
* Given a node handle, return its DOM-style node type.
*
*
%REVIEW% Generally, returning short is false economy. Return int?
*
* @param nodeHandle The node id.
* @return int Node type, as per the DOM's Node._NODE constants.
*/
short getNodeType(int nodeHandle);
// ============== Document query functions ==============
/**
* Returns the Element
whose ID
is given by elementId
. If
* no such element exists, returns DTM.NULL
. Behavior is not defined if more than one
* element has this ID
. Attributes (including those with the name "ID") are not of
* type ID unless so defined by DTD/Schema information available to the DTM implementation.
* Implementations that do not know whether attributes are of type ID or not are expected to
* return DTM.NULL
.
*
*
%REVIEW% Presumably IDs are still scoped to a single document, and this operation searches
* only within a single document, right? Wouldn't want collisions between DTMs in the same
* process.
*
* @param elementId The unique id
value for an element.
* @return The handle of the matching element.
*/
int getElementById(String elementId);
// ============== Boolean methods ================
/**
* Figure out whether nodeHandle2 should be considered as being later in the document than
* nodeHandle1, in Document Order as defined by the XPath model. This may not agree with the
* ordering defined by other XML applications.
*
*
There are some cases where ordering isn't defined, and neither are the results of this
* function -- though we'll generally return true.
*
*
%REVIEW% Make sure this does the right thing with attribute nodes!!!
*
*
%REVIEW% Consider renaming for clarity. Perhaps isDocumentOrder(a,b)?
*
* @param firstNodeHandle DOM Node to perform position comparison on.
* @param secondNodeHandle DOM Node to perform position comparison on.
* @return false if secondNode comes before firstNode, otherwise return true. You can think of
* this as (firstNode.documentOrderPosition <= secondNode.documentOrderPosition)
*
.
*/
boolean isNodeAfter(int firstNodeHandle, int secondNodeHandle);
/**
* Return an DOM node for the given node.
*
* @param nodeHandle The node ID.
* @return A node representation of the DTM node.
*/
org.w3c.dom.Node getNode(int nodeHandle);
// ==== Construction methods (may not be supported by some implementations!)
// =====
}