org.attoparser.select.package-info Maven / Gradle / Ivy
Show all versions of attoparser Show documentation
/**
*
* Handlers for filtering a part or several parts of markup during parsing
* in a fast and efficient way.
*
*
*
* Handler Implementations
*
*
* There are two main handlers (implementations of {@link org.attoparser.IMarkupHandler} for
* markup selection in this package:
*
*
* - {@link org.attoparser.select.BlockSelectorMarkupHandler}
* -
* For selecting entire blocks of markup (i.e.
* elements and all the nodes in their subtrees). This can be used, for example, for extracting
* fragments of markup during the parsing of the document, in a way so that discarded markup does
* never reach higher layers of the document processing infrastructure.
*
* - {@link org.attoparser.select.NodeSelectorMarkupHandler}
* -
* For selecting only specific nodes in markup (i.e. not including their subtrees). This can be used
* for modifying certain specific tags in markup during parsing, for example by
* adding additional attributes to them that are not present in the original parsed markup.
*
*
*
*
* Markup Selector Syntax
*
*
* Markup selectors used by handlers in this package use a specific syntax with features borrowed from
* XPath, CSS and jQuery selectors, in order to provide ease-of-use for most users. Many times there are several
* ways to express the same selector, depending on the user's preferences.
*
*
* For example, all the following equivalent selectors will select every <div> with class
* content, in any position in markup:
*
*
* //div[class='content']
* //div[@class='content']
* div[class='content']
* div[@class='content']
* //div.content
* div.content
*
*
* These are the different operations this syntax allows:
*
*
*
* Basic selectors
*
*
*
* - x
//x
* -
* Both are equivalent, and mean children of the current node with name x, at any depth in
* markup. If a reference resolver is being used, they will also be equivalent to
* %x (see below).
*
*
* - /x
* -
* Means direct children of the current node with name x.
*
*
* - x/y
* -
* Means direct children with name y of elements with name x, being the parent
* x elements at any level in markup.
*
*
* - x//y
* -
* Means children (at any level) with name y of elements with name x, being the parent
* x elements also at any level in markup.
*
*
* - text()
comment()
cdata()
doctype()
xmldecl()
procinstr()
* -
* These can be used like x (in the same places) but instead of selecting elements (i.e. tags)
* will select, respectively: text nodes, comments, CDATA sections, DOCTYPE clauses, XML Declarations and
* Processing Instructions.
*
*
*
*
* Attribute matching
*
*
*
* - x[z='v']
x[z="v"]
x[@z='v']
x[@z="v"]
* -
* All four equivalent, mean elements with name x and an attribute called z with value
* v. Note attribute values can be surrounded by single or double quotes, and attribute names
* can be specified with a leading @ (as in XPath) or without it (more similar to jQuery). For
* the sake of simplicity, only the single-quoted, no-@ syntax will be used for the rest of
* the examples below.
*
*
* - [z='v']
//[z='v']
* -
* Means any elements with an attribute called z with value v.
*
*
* - x[z]
* -
* Means elements with name x and an attribute called z, with any value.
*
*
* - x[!z]
* -
* Means elements with name x and no attribute called z.
*
*
* - x[z1='v1' and z2='v2']
* -
* Means elements with name x and attributes z1 and z2 with values
* v1 and v2, respectively.
*
*
* - x[z1='v1' or z2='v2']
* -
* Means elements with name x and, either an attribute z1 with value
* v1, or an attribute z2 with value v2.
*
*
* - x[z1='v1' and (z2='v2' or z3='v3')]
* -
* Selects according to the specified attribute complex expression. As can be seen, these expressions
* can be parenthesized to introduce a certain evaluation order.
*
*
* - x[z!='v']
x[z^='v']
x[z$='v']
x[z*='v']
* -
* Similar to x[z='v'] but applying different operators to attribute matching instead of
* equality (=). Respectively: not equal (!=),
* starts with (^=), ends with ($=) and
* contains (*=).
*
*
* - x.z
x[class='z']
* -
* When parsing in HTML mode (and only then), these two selectors will be completely equivalent. Besides,
* in this case the selector will look for an x element which has the z class, knowing that
* HTML's class attribute allows the specification of several classes separated by white space. So
* something like <x class="z y w"> will be matched by this selector.
*
*
* - x#z
x[id='z']
* -
* When parsing in HTML mode (and only then), these two selectors will be completely equivalent, matching those
* x elements that have an ID with value z.
*
*
*
*
*
* Index-based matching
*
*
* - x[i]
* -
* Means elements with name x positioned in index i among its siblings.
* Sibling here means node child of the same parent element, matching the same
* conditions (in this case "having x as name"). Note indexes start with
* 0.
*
*
* - x[z='v'][i]
* -
* Means elements with name x, attribute z with value v and positioned in
* number i among its siblings (same name, same attribute with that value).
*
*
* - x[even()]
x[odd()]
* -
* Means elements with name x positioned in an even (or odd) index among its siblings.
* Note even includes the index number 0.
*
*
* - x[>i]
x[<i]
* -
* Mean elements with name x positioned in an index greater (or lesser) than i
* among its siblings.
*
*
* - text()[i]
comment()[>i]
* -
* Applies the specified index-based matching operations to nodes of types other than elements: texts,
* comments, CDATA sections, etc.
*
*
*
*
*
* Reference-based matching
*
*
*
* - x%ref
* -
* Means elements with name x and matching markup selector reference
* with value ref. These markup selector references usually have a user-defined
* meaning and are resolved to a markup selector without references by means of an instance of the
* {@link org.attoparser.select.IMarkupSelectorReferenceResolver} interface passed to the selecting
* markup handlers ({@link org.attoparser.select.BlockSelectorMarkupHandler} and
* {@link org.attoparser.select.NodeSelectorMarkupHandler}) during construction.
* For example, a reference resolver could be
* configured that converts (resolves) %someref into
* div[class='someref' or id='someref']. Also, the
* Thymeleaf template engine uses this mechanism
* for resolving %fragmentName (or simply fragmentName, as explained below) into
* //[th:fragment='fragmentName' or data-th-fragment='fragmentName'].
*
*
* - %ref
* -
* Means any elements (whichever the name) matching reference with value ref.
*
*
* - ref
* -
* Equivalent to %ref. When a markup selector reference resolver has been configured,
* ref can bean both "element with name x" and
* "element matching reference x" (both will match).
*
*
*
*
*
*
*/
package org.attoparser.select;