All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.htmlparser.scanners.package.html Maven / Gradle / Ivy

The newest version!






The scanners package contains classes responsible for the tertiary
identification of tags. The lower level classes in the {@link
org.htmlparser.lexer.Lexer lexer} package convert
byte streams to characters and characters to nodes (via the {@link
org.htmlparser.NodeFactory NodeFactory}). In the case of tags, the
scanners in this package can then complete the tag or override the current tag
and return an augmented tag. The existing implementation of the {@link
org.htmlparser.scanners.CompositeTagScanner composite tag
scanner}, for example, gathers the children of composite tags, identifying the
nested structure of HTML documents. The {@link
org.htmlparser.scanners.ScriptScanner script scanner} overrides the nodes
returned by the lexer and creates a tag containing a single string that is the
script code.
You might need to create a scanner (that implements the {@link org.htmlparser.scanners.Scanner Scanner} interface) if the text you are trying to parse doesn't look like HTML, as is the case for the script scanner, or the normal processing of tags by nesting their structure is inadequate.




© 2015 - 2025 Weber Informatics LLC | Privacy Policy