xerces-2_12_1-xml-schema-1.1.docs.xni-design.xml Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of ibis-xerces Show documentation

Xerces2 is the next generation of high performance, fully compliant XML parsers in the Apache Xerces family. This new version of Xerces introduces the Xerces Native Interface (XNI), a complete framework for building parser components and configurations that is extremely modular and easy to program. The Apache Xerces2 parser is the reference implementation of XNI but other parser components, configurations, and parsers can be written using the Xerces Native Interface. For complete design and implementation documents, refer to the XNI Manual. Xerces2 is a fully conforming XML Schema 1.0 processor. A partial experimental implementation of the XML Schema 1.1 Structures and Datatypes Working Drafts (December 2009) and an experimental implementation of the XML Schema Definition Language (XSD): Component Designators (SCD) Candidate Recommendation (January 2010) are provided for evaluation. For more information, refer to the XML Schema page. Xerces2 also provides a complete implementation of the Document Object Model Level 3 Core and Load/Save W3C Recommendations and provides a complete implementation of the XML Inclusions (XInclude) W3C Recommendation. It also provides support for OASIS XML Catalogs v1.1. Xerces2 is able to parse documents written according to the XML 1.1 Recommendation, except that it does not yet provide an option to enable normalization checking as described in section 2.13 of this specification. It also handles namespaces according to the XML Namespaces 1.1 Recommendation, and will correctly serialize XML 1.1 documents if the DOM level 3 load/save APIs are in use.

There is a newer version: 2.12.2-xml-schema-1.1

Show newest version

<?xml version='1.0' encoding='UTF-8'?>
<!--
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
-->
<!DOCTYPE s1 SYSTEM 'dtd/document.dtd'>
<s1 title='XNI Design Details'>
 <s2 title='Overview'>
  <p>
   A parser written to conform to the Xerces Native Interface (XNI)
   framework is configured as a pipeline of parser components. The
   document's "streaming" information set flows through this pipeline
   of components to produce some sort of programming interface as the
   output. For example, the pipeline could produce a W3C Document
   Object Model (DOM) or a series of Simple API for XML (SAX) events.
  </p>
  <p>
   The core XNI interfaces provide a mechanism for the document
   information to flow from component to component. However, beyond 
   the basic information interfaces, XNI also defines a framework for 
   constructing these pipelines and parser configurations. This
   document is designed to give you an overview of this framework
   and what a parser looks like that is written to conform to the
   Xerces Native Interface. An overview of these frameworks are
   described below:
  </p>
  <ul>
   <li><link anchor='pipeline'>Pipeline</link></li>
   <li><link anchor='configuration'>Configuration</link></li>
  </ul>
  <p>
   For more detailed information, refer to the following documents:
  </p>
  <ul>
   <li><link idref='xni-core'>Core Interfaces</link></li>
   <li><link idref='xni-config'>Parser Configuration</link></li>
   <li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li>
  </ul>
 </s2>
 <anchor name='pipeline'/>
 <s2 title='Pipeline'>
  <p>
   The XNI parser pipeline is any combination of components that
   are either capable of producing XNI events, consuming XNI events,
   or both. All pipelines consist of a source, zero or more filters,
   and a target. The source is typically the XML scanner; common
   filters are DTD and XML Schema validators, a namespace binder,
   etc; and the target is the parser that consumes the XNI events
   and produces a common programming interface such as DOM or SAX.
   The following diagram illustrates the basic pipeline configuration.
  </p>
  <p>
   <img alt='Basic Pipeline Configuration' src='xni-pipeline-basic.gif'/>
  </p>
  <p>
   However, this is a simplified view of the pipeline configuration.
   The Xerces Native Interface actually defines two different pipelines
   with three interfaces: one for document information and two for DTD
   information. 
  </p>
  <p>
   The Xerces2 parser, the reference implementation of XNI,
   contains more components than the basic pipeline configuration 
   diagram shows. The following diagram shows the Xerces2 pipeline
   configuration. The arrow going from left to right on the top of the
   image represents the flow of document information and the arrows on 
   the bottom of the image represent the DTD information flowing through
   the parser pipeline.
  </p>
  <p>
   <img alt='Xerces2 Pipeline Configuration' src='xni-pipeline-detailed.gif'/>
  </p>
  <p>
   As the diagram shows, the "Document Scanner" is the source for
   document information and the "DTD Scanner" is the source for DTD
   information. Both document and DTD information generated by the
   scanners flow into the "DTD Validator" where structure and content 
   is validated according to the DTD grammar, if present. From here, 
   the validated document information with possible augmentations such 
   as default attribute values and attribute value normalization flows 
   to the "Namespace Binder" which applies the namespace information to 
   elements and attributes. The newly namespace-bound document 
   document information then flows to the "Schema Validator" for 
   validation based on the XML Schema, if present. Finally, the
   document and DTD information flow to the "Parser" which generates
   a programming interface such as DOM or SAX.
  </p>
  <p>
   XNI defines the document information using a number of core
   interfaces. (These interfaces are described in more detail in the
   <link idref='xni-api-core'>Core API</link> documentation.) But XNI 
   also defines a set of interfaces to build parser configurations 
   that assemble the pipelines in order to parse documents. The next
   section gives a general overview of the this parser configuration
   provided by XNI.
  </p>
 </s2>
 <anchor name='configuration'/>
 <s2 title='Configuration'>
  <p>
   A parser implementation written using the Xerces Native Interface
   can be seen as a collection of components, some of which are
   connected together to form the pipelines for document and DTD
   information. All of the components in the parser are managed by
   a "Component Manager" that does the following:
  </p>
  <ul>
   <li>Keeps track of parser settings and options,</li>
   <li>
    Instantiates and configures the various components in the parser, and
   </li>
   <li>Assembles the parsing pipeline and initiates parsing of documents.</li>
  </ul>
  <p>
   The following diagram represents a typical parser configuration
   that has a component manager and various components such as a
   "Symbol Table", "Scanner", etc.
  </p>
  <p>
   <img alt='Generic Parser Configuration' src='xni-components-overview.gif'/>
  </p>
  <p>
   Some of the components in a configuration are configurable and others
   are not. The actual details regarding component configuration, however,
   can be found in the <link idref='xni-config'>XNI Parser Configuration</link>
   document. But for now it is sufficient to understand the basic overview
   of parser configurations.
  </p>
  <p>
   The XNI parser configuration framework provides an easy and
   convenient way to construct different kinds of parser configurations.
   By separating the configuration from the API generation (in each
   specific parser object), different parser configurations can be used to
   build a DOM tree or emit SAX events without re-implementing the DOM or
   SAX code. The following diagram shows this separation. Notice how the
   document information flows through the pipeline in the parser 
   configuration and then to the parser object which generates different
   APIs.
  </p>
  <p>
   <img alt='Configuration and Parser Separation' src='xni-parser-configuration.gif'/>
  </p>
 </s2>
</s1>