
org.htmlparser.package.html Maven / Gradle / Ivy
Show all versions of htmllexer Show documentation
The basic API classes which will be used by most developers when working with
the HTML Parser.
The {@link org.htmlparser.Parser} class is the main high level class that
provides simplified access to the contents of an HTML page.
A wide range of methods is available to customize the operation of the Parser,
as well as access specific pieces of the page as
{@link org.htmlparser.Node Nodes}.
The {@link org.htmlparser.NodeFactory} interface specifies the requirements
for a developer to have the Parser or Lexer generate nodes. Three types of
nodes are required: {@link org.htmlparser.Text}, {@link org.htmlparser.Remark}
and {@link org.htmlparser.Tag Tags}. Tags contain lists
of child nodes and {@link org.htmlparser.Attribute attributes}.
The only provided implementation of the NodeFactory interface
is the {@link org.htmlparser.PrototypicalNodeFactory} which
operates by holding example nodes and cloning them as needed to satisfy the
requests for nodes by the Parser. By default, a Lexer is it's own NodeFactory,
returning new {@link org.htmlparser.nodes.TextNode},
{@link org.htmlparser.nodes.RemarkNode} and undifferentiated
{@link org.htmlparser.nodes.TagNode Tagnodes} (see the
{@link org.htmlparser.nodes nodes} package), but when the parser uses a lexer
it replaces this behaviour with a PrototypicalNodeFactory to return a rich
set of specific tags (see the {@link org.htmlparser.tags tags} package).
The {@link org.htmlparser.NodeFilter} interface is used by the filtering
code to determine if a node meets a certain criteria. Some generic examples of
filters can be found in the {@link org.htmlparser.filters filters} package.