doc.api.au.id.jericho.lib.html.package-summary.html Maven / Gradle / Ivy
au.id.jericho.lib.html (Jericho HTML Parser 1.5-dev1)
Package
Class
Tree
Deprecated
Index
Help
PREV PACKAGE
NEXT PACKAGE
FRAMES
NO FRAMES
Package au.id.jericho.lib.html
A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.
See:
Description
Interface Summary
IOutputSegment
Defines the interface for an output segment, which is used in an OutputDocument
to
replace segments of the source document with other text.
Class Summary
Attribute
Represents a single attribute
name/value segment within a StartTag
.
Attributes
Represents the list of Attribute
objects present within a particular StartTag
.
AttributesOutputSegment
Implements an IOutputSegment
whose content is a list of attribute name/value pairs.
BlankOutputSegment
Implements an IOutputSegment
whose content is a string of spaces with the same length as the segment.
CharacterEntityReference
Represents an HTML Character Entity Reference.
CharacterReference
Represents either a CharacterEntityReference
or NumericCharacterReference
.
CharOutputSegment
Implements an IOutputSegment
whose content is a character constant.
Element
Represents an HTML element,
which encompasses the StartTag
, an optional EndTag
and all content in between.
EndTag
Represents the end tag of an Element
.
FormControl
form controls
FormControlOutputStyle
*************
FormControlOutputStyle.DisplayValueConfig
*************
must not be null
FormControlType
Represents one of the HTML control types in a form
which have the potential to be successful.
FormField
Represents a field in an HTML form,
a field being defined as the combination of all form controls
having the same name.
FormFields
Represents a collection of FormField
objects.
NumericCharacterReference
Represents an HTML Numeric Character Reference.
OutputDocument
Represents a modified version of an original source text.
Segment
Represents a segment of a Source
document.
Source
Represents a source HTML document.
StartTag
Represents the start tag of an Element
.
StringOutputSegment
Implements an IOutputSegment
whose content is a CharSequence
.
Tag
Represents either a StartTag
or EndTag
.
Util
This class contains miscellaneous utility methods not directly associated with the HTML Parser library.
Exception Summary
OverlappingOutputSegmentsException
Signals that overlapping output segments have been detected in the OutputDocument
.
Package au.id.jericho.lib.html Description
A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.
The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL).
You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
For downloads, support and updates visit the SourceForge.net project page at
http://sourceforge.net/projects/jerichohtml/
For a summary of features and comparison with some other java HTML parsers, visit the homepage at
http://jerichohtml.sourceforge.net
Modifying an HTML Document
The typical method for modifying a document is as follows.
See the description of the OutputDocument
class for sample code.
- Create a
Source
object from the source text
- Find the required segments by calling methods on the Source object and other segments
- Create an
OutputDocument
object from the source text
- Add an
IOutputSegment
to the OutputDocument for each segment of the document that is to be replaced with other text
- Call the
OutputDocument.toString()
method to get the final output
Analysing or Extracting Information from an HTML Document
If the document only needs to be analysed instead of modified, only the first two steps listed above are required.
See the description of the FormFields
class for sample code.
Package
Class
Tree
Deprecated
Index
Help
PREV PACKAGE
NEXT PACKAGE
FRAMES
NO FRAMES