All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.w3c.dom.ls.LSParser Maven / Gradle / Ivy

Go to download

Xerces2 provides high performance, fully compliant XML parsers in the Apache Xerces family. This new version of Xerces continues to build upon the Xerces Native Interface (XNI), a complete framework for building parser components and configurations that is extremely modular and easy to program. The Apache Xerces2 parser is the reference implementation of XNI but other parser components, configurations, and parsers can be written using the Xerces Native Interface. For complete design and implementation documents, refer to the XNI Manual. Xerces2 provides fully conforming XML Schema 1.0 and 1.1 processors. An experimental implementation of the "XML Schema Definition Language (XSD): Component Designators (SCD) Candidate Recommendation (January 2010)" is also provided for evaluation. For more information, refer to the XML Schema page. Xerces2 also provides a complete implementation of the Document Object Model Level 3 Core and Load/Save W3C Recommendations and provides a complete implementation of the XML Inclusions (XInclude) W3C Recommendation. It also provides support for OASIS XML Catalogs v1.1. Xerces2 is able to parse documents written according to the XML 1.1 Recommendation, except that it does not yet provide an option to enable normalization checking as described in section 2.13 of this specification. It also handles namespaces according to the XML Namespaces 1.1 Recommendation, and will correctly serialize XML 1.1 documents if the DOM level 3 load/save APIs are in use.

There is a newer version: 2.12.2
Show newest version
/*
 * Copyright (c) 2004 World Wide Web Consortium,
 *
 * (Massachusetts Institute of Technology, European Research Consortium for
 * Informatics and Mathematics, Keio University). All Rights Reserved. This
 * work is distributed under the W3C(r) Software License [1] in the hope that
 * it will be useful, but WITHOUT ANY WARRANTY; without even the implied
 * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 *
 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
 */

package org.w3c.dom.ls;

import org.w3c.dom.Document;
import org.apache.xerces.dom3.DOMConfiguration;
import org.w3c.dom.Node;
import org.w3c.dom.DOMException;

/**
 *  An interface to an object that is able to build, or augment, a DOM tree 
 * from various input sources. 
 * 

LSParser provides an API for parsing XML and building the * corresponding DOM document structure. A LSParser instance * can be obtained by invoking the * DOMImplementationLS.createLSParser() method. *

As specified in DOM Level 3 Core * , when a document is first made available via the LSParser: *

    *
  • there will * never be two adjacent nodes of type NODE_TEXT, and there will never be * empty text nodes. *
  • *
  • it is expected that the value and * nodeValue attributes of an Attr node initially * return the XML 1.0 * normalized value. However, if the parameters " * validate-if-schema" and " * datatype-normalization" are set to true, depending on the attribute normalization * used, the attribute values may differ from the ones obtained by the XML * 1.0 attribute normalization. If the parameters " * datatype-normalization" is set to false, the XML 1.0 attribute normalization is * guaranteed to occur, and if the attributes list does not contain * namespace declarations, the attributes attribute on * Element node represents the property [attributes] defined in [XML Information Set] * . *
  • *
*

Asynchronous LSParser objects are expected to also * implement the events::EventTarget interface so that event * listeners can be registered on asynchronous LSParser * objects. *

Events supported by asynchronous LSParser objects are: *

*
load
*
* The LSParser finishes to load the document. See also the * definition of the LSLoadEvent interface.
*
progress
*
The * LSParser signals progress as data is parsed. This * specification does not attempt to define exactly when progress events * should be dispatched, that is intentionally left as implementation * dependent, but here is one example of how an application might dispatch * progress events. Once the parser starts receiving data, a progress event * is dispatched to indicate that the parsing starts, then from there on, a * progress event is dispatched for every 4096 bytes of data that is * received and processed. This is only one example, though, and * implementations can choose to dispatch progress events at any time while * parsing, or not dispatch them at all. See also the definition of the * LSProgressEvent interface.
*
*

Note: All events defined in this specification use the * namespace URI "http://www.w3.org/2002/DOMLS". *

While parsing an input source, errors are reported to the application * through the error handler (LSParser.domConfig's " * error-handler" parameter). This specification does in no way try to define all possible * errors that can occur while parsing XML, or any other markup, but some * common error cases are defined. The types (DOMError.type) of * errors and warnings defined by this specification are: *

*
* "check-character-normalization-failure" [error]
*
Raised if * the paramter " * check-character-normalization" is set to true and a string is encountered that fails normalization * checking.
*
"doctype-not-allowed" [fatal]
*
Raised if the * configuration parameter "disallow-doctype" is set to true * and a doctype is encountered.
*
"no-input-specified" [fatal]
*
* Raised when loading a document and no input is specified in the * LSInput object.
*
* "pi-base-uri-not-preserved" [warning]
*
Raised if a processing * instruction is encountered in a location where the base URI of the * processing instruction can not be preserved. One example of a case where * this warning will be raised is if the configuration parameter " * entities" is set to false and the following XML file is parsed: *
 * <!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]> 
 * <root> &e; </root>
* And subdir/myentity.ent * contains: *
<one> <two/> </one> <?pi 3.14159?> 
 * <more/>
*
*
"unbound-prefix-in-entity" [warning]
*
An * implementation dependent warning that may be raised if the configuration * parameter " * namespaces" is set to true and an unbound namespace prefix is * encountered in an entity's replacement text. Raising this warning is not * enforced since some existing parsers may not recognize unbound namespace * prefixes in the replacement text of entities.
*
* "unknown-character-denormalization" [fatal]
*
Raised if the * configuration parameter "ignore-unknown-character-denormalizations" is * set to false and a character is encountered for which the * processor cannot determine the normalization properties.
*
* "unsupported-encoding" [fatal]
*
Raised if an unsupported * encoding is encountered.
*
"unsupported-media-type" [fatal]
*
* Raised if the configuration parameter "supported-media-types-only" is set * to true and an unsupported media type is encountered.
*
*

In addition to raising the defined errors and warnings, implementations * are expected to raise implementation specific errors and warnings for any * other error and warning cases such as IO errors (file not found, * permission denied,...), XML well-formedness errors, and so on. */ public interface LSParser { /** * The DOMConfiguration object used when parsing an input * source. This DOMConfiguration is specific to the parse * operation and no parameter values from this * DOMConfiguration object are passed automatically to the * DOMConfiguration object on the Document * that is created, or used, by the parse operation. The DOM application * is responsible for passing any needed parameter values from this * DOMConfiguration object to the * DOMConfiguration object referenced by the * Document object. *
In addition to the parameters recognized in on the * DOMConfiguration interface defined in DOM Level 3 Core * , the DOMConfiguration objects for LSParser * add or modify the following parameters: *

*
* "charset-overrides-xml-encoding"
*
*
*
true
*
[optional] (default) If a higher level protocol such as HTTP [IETF RFC 2616] provides an * indication of the character encoding of the input stream being * processed, that will override any encoding specified in the XML * declaration or the Text declaration (see also section 4.3.3, * "Character Encoding in Entities", in [XML 1.0]). * Explicitly setting an encoding in the LSInput overrides * any encoding from the protocol.
*
false
*
[required] The parser ignores any character set encoding information from * higher-level protocols.
*
*
"disallow-doctype"
*
*
*
* true
*
[optional] Throw a fatal "doctype-not-allowed" error if a doctype node is found while parsing the document. This is * useful when dealing with things like SOAP envelopes where doctype * nodes are not allowed.
*
false
*
[required] (default) Allow doctype nodes in the document.
*
*
* "ignore-unknown-character-denormalizations"
*
*
*
* true
*
[required] (default) If, while verifying full normalization when [XML 1.1] is * supported, a processor encounters characters for which it cannot * determine the normalization properties, then the processor will * ignore any possible denormalizations caused by these characters. * This parameter is ignored for [XML 1.0].
*
* false
*
[optional] Report an fatal "unknown-character-denormalization" error if a character is encountered for which the processor cannot * determine the normalization properties.
*
*
"infoset"
*
See * the definition of DOMConfiguration for a description of * this parameter. Unlike in [DOM Level 3 Core] * , this parameter will default to true for * LSParser.
*
"namespaces"
*
*
*
true
*
[required] (default) Perform the namespace processing as defined in [XML Namespaces] * and [XML Namespaces 1.1] * .
*
false
*
[optional] Do not perform the namespace processing.
*
*
* "resource-resolver"
*
[required] A reference to a LSResourceResolver object, or null. If * the value of this parameter is not null when an external resource * (such as an external XML entity or an XML schema location) is * encountered, the implementation will request that the * LSResourceResolver referenced in this parameter resolves * the resource.
*
"supported-media-types-only"
*
*
*
* true
*
[optional] Check that the media type of the parsed resource is a supported media * type. If an unsupported media type is encountered, a fatal error of * type "unsupported-media-type" will be raised. The media types defined in [IETF RFC 3023] must always * be accepted.
*
false
*
[required] (default) Accept any media type.
*
*
*
The parameter " * well-formed" cannot be set to false. */ public DOMConfiguration getDomConfig(); /** * When a filter is provided, the implementation will call out to the * filter as it is constructing the DOM tree structure. The filter can * choose to remove elements from the document being constructed, or to * terminate the parsing early. *
The filter is invoked after the operations requested by the * DOMConfiguration parameters have been applied. For * example, if " * validate" is set to true, the validation is done before invoking the * filter. */ public LSParserFilter getFilter(); /** * When a filter is provided, the implementation will call out to the * filter as it is constructing the DOM tree structure. The filter can * choose to remove elements from the document being constructed, or to * terminate the parsing early. *
The filter is invoked after the operations requested by the * DOMConfiguration parameters have been applied. For * example, if " * validate" is set to true, the validation is done before invoking the * filter. */ public void setFilter(LSParserFilter filter); /** * true if the LSParser is asynchronous, * false if it is synchronous. */ public boolean getAsync(); /** * true if the LSParser is currently busy * loading a document, otherwise false. */ public boolean getBusy(); /** * Parse an XML document from a resource identified by a * LSInput. * @param input The LSInput from which the source of the * document is to be read. * @return If the LSParser is a synchronous * LSParser, the newly created and populated * Document is returned. If the LSParser is * asynchronous, null is returned since the document * object may not yet be constructed when this method returns. * @exception DOMException * INVALID_STATE_ERR: Raised if the LSParser's * LSParser.busy attribute is true. * @exception LSException * PARSE_ERR: Raised if the LSParser was unable to load * the XML document. DOM applications should attach a * DOMErrorHandler using the parameter " * error-handler" if they wish to get details on the error. */ public Document parse(LSInput input) throws DOMException, LSException; /** * Parse an XML document from a location identified by a URI reference [IETF RFC 2396]. If the URI * contains a fragment identifier (see section 4.1 in [IETF RFC 2396]), the * behavior is not defined by this specification, future versions of * this specification may define the behavior. * @param uri The location of the XML document to be read. * @return If the LSParser is a synchronous * LSParser, the newly created and populated * Document is returned, or null if an error * occured. If the LSParser is asynchronous, * null is returned since the document object may not yet * be constructed when this method returns. * @exception DOMException * INVALID_STATE_ERR: Raised if the LSParser.busy * attribute is true. * @exception LSException * PARSE_ERR: Raised if the LSParser was unable to load * the XML document. DOM applications should attach a * DOMErrorHandler using the parameter " * error-handler" if they wish to get details on the error. */ public Document parseURI(String uri) throws DOMException, LSException; // ACTION_TYPES /** * Append the result of the parse operation as children of the context * node. For this action to work, the context node must be an * Element or a DocumentFragment. */ public static final short ACTION_APPEND_AS_CHILDREN = 1; /** * Replace all the children of the context node with the result of the * parse operation. For this action to work, the context node must be an * Element, a Document, or a * DocumentFragment. */ public static final short ACTION_REPLACE_CHILDREN = 2; /** * Insert the result of the parse operation as the immediately preceding * sibling of the context node. For this action to work the context * node's parent must be an Element or a * DocumentFragment. */ public static final short ACTION_INSERT_BEFORE = 3; /** * Insert the result of the parse operation as the immediately following * sibling of the context node. For this action to work the context * node's parent must be an Element or a * DocumentFragment. */ public static final short ACTION_INSERT_AFTER = 4; /** * Replace the context node with the result of the parse operation. For * this action to work, the context node must have a parent, and the * parent must be an Element or a * DocumentFragment. */ public static final short ACTION_REPLACE = 5; /** * Parse an XML fragment from a resource identified by a * LSInput and insert the content into an existing document * at the position specified with the context and * action arguments. When parsing the input stream, the * context node (or its parent, depending on where the result will be * inserted) is used for resolving unbound namespace prefixes. The * context node's ownerDocument node (or the node itself if * the node of type DOCUMENT_NODE) is used to resolve * default attributes and entity references. *
As the new data is inserted into the document, at least one * mutation event is fired per new immediate child or sibling of the * context node. *
If the context node is a Document node and the action * is ACTION_REPLACE_CHILDREN, then the document that is * passed as the context node will be changed such that its * xmlEncoding, documentURI, * xmlVersion, inputEncoding, * xmlStandalone, and all other such attributes are set to * what they would be set to if the input source was parsed using * LSParser.parse(). *
This method is always synchronous, even if the * LSParser is asynchronous (LSParser.async is * true). *
If an error occurs while parsing, the caller is notified through * the ErrorHandler instance associated with the " * error-handler" parameter of the DOMConfiguration. *
When calling parseWithContext, the values of the * following configuration parameters will be ignored and their default * values will always be used instead: " * validate", " * validate-if-schema", and " * element-content-whitespace". Other parameters will be treated normally, and the parser is expected * to call the LSParserFilter just as if a whole document * was parsed. * @param input The LSInput from which the source document * is to be read. The source document must be an XML fragment, i.e. * anything except a complete XML document (except in the case where * the context node of type DOCUMENT_NODE, and the action * is ACTION_REPLACE_CHILDREN), a DOCTYPE (internal * subset), entity declaration(s), notation declaration(s), or XML or * text declaration(s). * @param contextArg The node that is used as the context for the data * that is being parsed. This node must be a Document * node, a DocumentFragment node, or a node of a type * that is allowed as a child of an Element node, e.g. it * cannot be an Attribute node. * @param action This parameter describes which action should be taken * between the new set of nodes being inserted and the existing * children of the context node. The set of possible actions is * defined in ACTION_TYPES above. * @return Return the node that is the result of the parse operation. If * the result is more than one top-level node, the first one is * returned. * @exception DOMException * HIERARCHY_REQUEST_ERR: Raised if the content cannot replace, be * inserted before, after, or as a child of the context node (see also * Node.insertBefore or Node.replaceChild in [DOM Level 3 Core] * ). *
NOT_SUPPORTED_ERR: Raised if the LSParser doesn't * support this method, or if the context node is of type * Document and the DOM implementation doesn't support * the replacement of the DocumentType child or * Element child. *
NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a * read only node and the content is being appended to its child list, * or if the parent node of the context node is read only node and the * content is being inserted in its child list. *
INVALID_STATE_ERR: Raised if the LSParser.busy * attribute is true. * @exception LSException * PARSE_ERR: Raised if the LSParser was unable to load * the XML fragment. DOM applications should attach a * DOMErrorHandler using the parameter " * error-handler" if they wish to get details on the error. */ public Node parseWithContext(LSInput input, Node contextArg, short action) throws DOMException, LSException; /** * Abort the loading of the document that is currently being loaded by * the LSParser. If the LSParser is currently * not busy, a call to this method does nothing. */ public void abort(); }




© 2015 - 2024 Weber Informatics LLC | Privacy Policy