org.w3c.dom.ls.LSParser Maven / Gradle / Ivy
/*
* Copyright (c) 2004 World Wide Web Consortium,
*
* (Massachusetts Institute of Technology, European Research Consortium for
* Informatics and Mathematics, Keio University). All Rights Reserved. This
* work is distributed under the W3C(r) Software License [1] in the hope that
* it will be useful, but WITHOUT ANY WARRANTY; without even the implied
* warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
*
* [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
*/
package org.w3c.dom.ls;
import org.w3c.dom.DOMConfiguration;
import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
/**
* An interface to an object that is able to build, or augment, a DOM tree
* from various input sources.
* LSParser
provides an API for parsing XML and building the
* corresponding DOM document structure. A LSParser
instance
* can be obtained by invoking the
* DOMImplementationLS.createLSParser()
method.
*
As specified in [DOM Level 3 Core]
* , when a document is first made available via the LSParser:
*
* - there will
* never be two adjacent nodes of type NODE_TEXT, and there will never be
* empty text nodes.
*
* - it is expected that the
value
and
* nodeValue
attributes of an Attr
node initially
* return the XML 1.0
* normalized value. However, if the parameters "
* validate-if-schema" and "
* datatype-normalization" are set to true
, depending on the attribute normalization
* used, the attribute values may differ from the ones obtained by the XML
* 1.0 attribute normalization. If the parameters "
* datatype-normalization" is set to false
, the XML 1.0 attribute normalization is
* guaranteed to occur, and if the attributes list does not contain
* namespace declarations, the attributes
attribute on
* Element
node represents the property [attributes] defined in [XML Information Set]
* .
*
*
* Asynchronous LSParser
objects are expected to also
* implement the events::EventTarget
interface so that event
* listeners can be registered on asynchronous LSParser
* objects.
*
Events supported by asynchronous LSParser
objects are:
*
* - load
* -
* The
LSParser
finishes to load the document. See also the
* definition of the LSLoadEvent
interface.
* - progress
* - The
*
LSParser
signals progress as data is parsed. This
* specification does not attempt to define exactly when progress events
* should be dispatched. That is intentionally left as
* implementation-dependent. Here is one example of how an application might
* dispatch progress events: Once the parser starts receiving data, a
* progress event is dispatched to indicate that the parsing starts. From
* there on, a progress event is dispatched for every 4096 bytes of data
* that is received and processed. This is only one example, though, and
* implementations can choose to dispatch progress events at any time while
* parsing, or not dispatch them at all. See also the definition of the
* LSProgressEvent
interface.
*
* Note: All events defined in this specification use the
* namespace URI "http://www.w3.org/2002/DOMLS"
.
*
While parsing an input source, errors are reported to the application
* through the error handler (LSParser.domConfig
's "
* error-handler" parameter). This specification does in no way try to define all possible
* errors that can occur while parsing XML, or any other markup, but some
* common error cases are defined. The types (DOMError.type
) of
* errors and warnings defined by this specification are:
*
* -
*
"check-character-normalization-failure" [error]
* - Raised if
* the parameter "
* check-character-normalization" is set to true and a string is encountered that fails normalization
* checking.
* "doctype-not-allowed" [fatal]
* - Raised if the
* configuration parameter "disallow-doctype" is set to
true
* and a doctype is encountered.
* "no-input-specified" [fatal]
* -
* Raised when loading a document and no input is specified in the
*
LSInput
object.
* -
*
"pi-base-uri-not-preserved" [warning]
* - Raised if a processing
* instruction is encountered in a location where the base URI of the
* processing instruction can not be preserved. One example of a case where
* this warning will be raised is if the configuration parameter "
* entities" is set to
false
and the following XML file is parsed:
*
* <!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]>
* <root> &e; </root>
* And subdir/myentity.ent
* contains:
* <one> <two/> </one> <?pi 3.14159?>
* <more/>
*
* "unbound-prefix-in-entity" [warning]
* - An
* implementation dependent warning that may be raised if the configuration
* parameter "
* namespaces" is set to
true
and an unbound namespace prefix is
* encountered in an entity's replacement text. Raising this warning is not
* enforced since some existing parsers may not recognize unbound namespace
* prefixes in the replacement text of entities.
* -
*
"unknown-character-denormalization" [fatal]
* - Raised if the
* configuration parameter "ignore-unknown-character-denormalizations" is
* set to
false
and a character is encountered for which the
* processor cannot determine the normalization properties.
* -
*
"unsupported-encoding" [fatal]
* - Raised if an unsupported
* encoding is encountered.
* "unsupported-media-type" [fatal]
* -
* Raised if the configuration parameter "supported-media-types-only" is set
* to
true
and an unsupported media type is encountered.
*
* In addition to raising the defined errors and warnings, implementations
* are expected to raise implementation specific errors and warnings for any
* other error and warning cases such as IO errors (file not found,
* permission denied,...), XML well-formedness errors, and so on.
*
See also the Document Object Model (DOM) Level 3 Load
and Save Specification.
*/
public interface LSParser {
/**
* The DOMConfiguration
object used when parsing an input
* source. This DOMConfiguration
is specific to the parse
* operation. No parameter values from this DOMConfiguration
* object are passed automatically to the DOMConfiguration
* object on the Document
that is created, or used, by the
* parse operation. The DOM application is responsible for passing any
* needed parameter values from this DOMConfiguration
* object to the DOMConfiguration
object referenced by the
* Document
object.
*
In addition to the parameters recognized in on the
* DOMConfiguration interface defined in [DOM Level 3 Core]
* , the DOMConfiguration
objects for LSParser
* add or modify the following parameters:
*
* -
*
"charset-overrides-xml-encoding"
* -
*
* true
* - [optional] (default) If a higher level protocol such as HTTP [IETF RFC 2616] provides an
* indication of the character encoding of the input stream being
* processed, that will override any encoding specified in the XML
* declaration or the Text declaration (see also section 4.3.3,
* "Character Encoding in Entities", in [XML 1.0]).
* Explicitly setting an encoding in the
LSInput
overrides
* any encoding from the protocol.
* false
* - [required] The parser ignores any character set encoding information from
* higher-level protocols.
*
* "disallow-doctype"
* -
*
* -
*
true
* - [optional] Throw a fatal "doctype-not-allowed" error if a doctype node is found while parsing the document. This is
* useful when dealing with things like SOAP envelopes where doctype
* nodes are not allowed.
* false
* - [required] (default) Allow doctype nodes in the document.
*
* -
*
"ignore-unknown-character-denormalizations"
* -
*
* -
*
true
* - [required] (default) If, while verifying full normalization when [XML 1.1] is
* supported, a processor encounters characters for which it cannot
* determine the normalization properties, then the processor will
* ignore any possible denormalizations caused by these characters.
* This parameter is ignored for [XML 1.0].
* -
*
false
* - [optional] Report an fatal "unknown-character-denormalization" error if a character is encountered for which the processor cannot
* determine the normalization properties.
*
* "infoset"
* - See
* the definition of
DOMConfiguration
for a description of
* this parameter. Unlike in [DOM Level 3 Core]
* , this parameter will default to true
for
* LSParser
.
* "namespaces"
* -
*
* true
* - [required] (default) Perform the namespace processing as defined in [XML Namespaces]
* and [XML Namespaces 1.1]
* .
* false
* - [optional] Do not perform the namespace processing.
*
* -
*
"resource-resolver"
* - [required] A reference to a
LSResourceResolver
object, or null. If
* the value of this parameter is not null when an external resource
* (such as an external XML entity or an XML schema location) is
* encountered, the implementation will request that the
* LSResourceResolver
referenced in this parameter resolves
* the resource.
* "supported-media-types-only"
* -
*
* -
*
true
* - [optional] Check that the media type of the parsed resource is a supported media
* type. If an unsupported media type is encountered, a fatal error of
* type "unsupported-media-type" will be raised. The media types defined in [IETF RFC 3023] must always
* be accepted.
* false
* - [required] (default) Accept any media type.
*
* "validate"
* - See the definition of
*
DOMConfiguration
for a description of this parameter.
* Unlike in [DOM Level 3 Core]
* , the processing of the internal subset is always accomplished, even
* if this parameter is set to false
.
* -
*
"validate-if-schema"
* - See the definition of
*
DOMConfiguration
for a description of this parameter.
* Unlike in [DOM Level 3 Core]
* , the processing of the internal subset is always accomplished, even
* if this parameter is set to false
.
* -
*
"well-formed"
* - See the definition of
*
DOMConfiguration
for a description of this parameter.
* Unlike in [DOM Level 3 Core]
* , this parameter cannot be set to false
.
*
*/
public DOMConfiguration getDomConfig();
/**
* When a filter is provided, the implementation will call out to the
* filter as it is constructing the DOM tree structure. The filter can
* choose to remove elements from the document being constructed, or to
* terminate the parsing early.
*
The filter is invoked after the operations requested by the
* DOMConfiguration
parameters have been applied. For
* example, if "
* validate" is set to true
, the validation is done before invoking the
* filter.
*/
public LSParserFilter getFilter();
/**
* When a filter is provided, the implementation will call out to the
* filter as it is constructing the DOM tree structure. The filter can
* choose to remove elements from the document being constructed, or to
* terminate the parsing early.
*
The filter is invoked after the operations requested by the
* DOMConfiguration
parameters have been applied. For
* example, if "
* validate" is set to true
, the validation is done before invoking the
* filter.
*/
public void setFilter(LSParserFilter filter);
/**
* true
if the LSParser
is asynchronous,
* false
if it is synchronous.
*/
public boolean getAsync();
/**
* true
if the LSParser
is currently busy
* loading a document, otherwise false
.
*/
public boolean getBusy();
/**
* Parse an XML document from a resource identified by a
* LSInput
.
* @param input The LSInput
from which the source of the
* document is to be read.
* @return If the LSParser
is a synchronous
* LSParser
, the newly created and populated
* Document
is returned. If the LSParser
is
* asynchronous, null
is returned since the document
* object may not yet be constructed when this method returns.
* @exception DOMException
* INVALID_STATE_ERR: Raised if the LSParser
's
* LSParser.busy
attribute is true
.
* @exception LSException
* PARSE_ERR: Raised if the LSParser
was unable to load
* the XML document. DOM applications should attach a
* DOMErrorHandler
using the parameter "
* error-handler" if they wish to get details on the error.
*/
public Document parse(LSInput input)
throws DOMException, LSException;
/**
* Parse an XML document from a location identified by a URI reference [IETF RFC 2396]. If the URI
* contains a fragment identifier (see section 4.1 in [IETF RFC 2396]), the
* behavior is not defined by this specification, future versions of
* this specification may define the behavior.
* @param uri The location of the XML document to be read.
* @return If the LSParser
is a synchronous
* LSParser
, the newly created and populated
* Document
is returned, or null
if an error
* occured. If the LSParser
is asynchronous,
* null
is returned since the document object may not yet
* be constructed when this method returns.
* @exception DOMException
* INVALID_STATE_ERR: Raised if the LSParser.busy
* attribute is true
.
* @exception LSException
* PARSE_ERR: Raised if the LSParser
was unable to load
* the XML document. DOM applications should attach a
* DOMErrorHandler
using the parameter "
* error-handler" if they wish to get details on the error.
*/
public Document parseURI(String uri)
throws DOMException, LSException;
// ACTION_TYPES
/**
* Append the result of the parse operation as children of the context
* node. For this action to work, the context node must be an
* Element
or a DocumentFragment
.
*/
public static final short ACTION_APPEND_AS_CHILDREN = 1;
/**
* Replace all the children of the context node with the result of the
* parse operation. For this action to work, the context node must be an
* Element
, a Document
, or a
* DocumentFragment
.
*/
public static final short ACTION_REPLACE_CHILDREN = 2;
/**
* Insert the result of the parse operation as the immediately preceding
* sibling of the context node. For this action to work the context
* node's parent must be an Element
or a
* DocumentFragment
.
*/
public static final short ACTION_INSERT_BEFORE = 3;
/**
* Insert the result of the parse operation as the immediately following
* sibling of the context node. For this action to work the context
* node's parent must be an Element
or a
* DocumentFragment
.
*/
public static final short ACTION_INSERT_AFTER = 4;
/**
* Replace the context node with the result of the parse operation. For
* this action to work, the context node must have a parent, and the
* parent must be an Element
or a
* DocumentFragment
.
*/
public static final short ACTION_REPLACE = 5;
/**
* Parse an XML fragment from a resource identified by a
* LSInput
and insert the content into an existing document
* at the position specified with the context
and
* action
arguments. When parsing the input stream, the
* context node (or its parent, depending on where the result will be
* inserted) is used for resolving unbound namespace prefixes. The
* context node's ownerDocument
node (or the node itself if
* the node of type DOCUMENT_NODE
) is used to resolve
* default attributes and entity references.
*
As the new data is inserted into the document, at least one
* mutation event is fired per new immediate child or sibling of the
* context node.
*
If the context node is a Document
node and the action
* is ACTION_REPLACE_CHILDREN
, then the document that is
* passed as the context node will be changed such that its
* xmlEncoding
, documentURI
,
* xmlVersion
, inputEncoding
,
* xmlStandalone
, and all other such attributes are set to
* what they would be set to if the input source was parsed using
* LSParser.parse()
.
*
This method is always synchronous, even if the
* LSParser
is asynchronous (LSParser.async
is
* true
).
*
If an error occurs while parsing, the caller is notified through
* the ErrorHandler
instance associated with the "
* error-handler" parameter of the DOMConfiguration
.
*
When calling parseWithContext
, the values of the
* following configuration parameters will be ignored and their default
* values will always be used instead: "
* validate", "
* validate-if-schema", and "
* element-content-whitespace". Other parameters will be treated normally, and the parser is expected
* to call the LSParserFilter
just as if a whole document
* was parsed.
* @param input The LSInput
from which the source document
* is to be read. The source document must be an XML fragment, i.e.
* anything except a complete XML document (except in the case where
* the context node of type DOCUMENT_NODE
, and the action
* is ACTION_REPLACE_CHILDREN
), a DOCTYPE (internal
* subset), entity declaration(s), notation declaration(s), or XML or
* text declaration(s).
* @param contextArg The node that is used as the context for the data
* that is being parsed. This node must be a Document
* node, a DocumentFragment
node, or a node of a type
* that is allowed as a child of an Element
node, e.g. it
* cannot be an Attribute
node.
* @param action This parameter describes which action should be taken
* between the new set of nodes being inserted and the existing
* children of the context node. The set of possible actions is
* defined in ACTION_TYPES
above.
* @return Return the node that is the result of the parse operation. If
* the result is more than one top-level node, the first one is
* returned.
* @exception DOMException
* HIERARCHY_REQUEST_ERR: Raised if the content cannot replace, be
* inserted before, after, or as a child of the context node (see also
* Node.insertBefore
or Node.replaceChild
in [DOM Level 3 Core]
* ).
*
NOT_SUPPORTED_ERR: Raised if the LSParser
doesn't
* support this method, or if the context node is of type
* Document
and the DOM implementation doesn't support
* the replacement of the DocumentType
child or
* Element
child.
*
NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a
* read only node and the content is being appended to its child list,
* or if the parent node of the context node is read only node and the
* content is being inserted in its child list.
*
INVALID_STATE_ERR: Raised if the LSParser.busy
* attribute is true
.
* @exception LSException
* PARSE_ERR: Raised if the LSParser
was unable to load
* the XML fragment. DOM applications should attach a
* DOMErrorHandler
using the parameter "
* error-handler" if they wish to get details on the error.
*/
public Node parseWithContext(LSInput input,
Node contextArg,
short action)
throws DOMException, LSException;
/**
* Abort the loading of the document that is currently being loaded by
* the LSParser
. If the LSParser
is currently
* not busy, a call to this method does nothing.
*/
public void abort();
}