All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.htmlparser.package.html Maven / Gradle / Ivy

The newest version!






The basic API classes which will be used by most developers when working with
the HTML Parser.

The {@link org.htmlparser.Parser} class is the main high level class that provides simplified access to the contents of an HTML page. A wide range of methods is available to customize the operation of the Parser, as well as access specific pieces of the page as {@link org.htmlparser.Node Nodes}.

The {@link org.htmlparser.NodeFactory} interface specifies the requirements for a developer to have the Parser or Lexer generate nodes. Three types of nodes are required: {@link org.htmlparser.Text}, {@link org.htmlparser.Remark} and {@link org.htmlparser.Tag Tags}. Tags contain lists of child nodes and {@link org.htmlparser.Attribute attributes}.

The only provided implementation of the NodeFactory interface is the {@link org.htmlparser.PrototypicalNodeFactory} which operates by holding example nodes and cloning them as needed to satisfy the requests for nodes by the Parser. By default, a Lexer is it's own NodeFactory, returning new {@link org.htmlparser.nodes.TextNode}, {@link org.htmlparser.nodes.RemarkNode} and undifferentiated {@link org.htmlparser.nodes.TagNode Tagnodes} (see the {@link org.htmlparser.nodes nodes} package), but when the parser uses a lexer it replaces this behaviour with a PrototypicalNodeFactory to return a rich set of specific tags (see the {@link org.htmlparser.tags tags} package).

The {@link org.htmlparser.NodeFilter} interface is used by the filtering code to determine if a node meets a certain criteria. Some generic examples of filters can be found in the {@link org.htmlparser.filters filters} package.





© 2015 - 2025 Weber Informatics LLC | Privacy Policy