src.au.id.jericho.lib.html.package.html Maven / Gradle / Ivy
Jericho HTML Parser (jericho-html) release 1.0
A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.
The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL).
You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
For downloads, support and updates visit the SourceForge.net project page at
http://sourceforge.net/projects/jerichohtml/
For a summary of features and comparison with some other java HTML parsers, visit the homepage at
http://jerichohtml.sourceforge.net
Modifying an HTML Document
The typical method for modifying a document is as follows.
See the description of the {@link au.id.jericho.lib.html.OutputDocument} class for sample code.
- Create a {@link au.id.jericho.lib.html.Source} object from the source text
- Find the required segments by calling methods on the Source object and other segments
- Create an {@link au.id.jericho.lib.html.OutputDocument} object from the source text
- Add an {@link au.id.jericho.lib.html.IOutputSegment} to the OutputDocument for each segment of the document that is to be replaced with other text
- Call the {@link au.id.jericho.lib.html.OutputDocument#toString()} method to get the final output
Analysing or Extracting Information from an HTML Document
If the document only needs to be analysed instead of modified, only the first two steps listed above are required.
See the description of the {@link au.id.jericho.lib.html.FormFields} class for sample code.