All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.api.au.id.jericho.lib.html.package-summary.html Maven / Gradle / Ivy

Go to download

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

There is a newer version: 2.3
Show newest version






au.id.jericho.lib.html (Jericho HTML Parser 1.5-dev1)

















Package au.id.jericho.lib.html

A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.

See:
          Description

Interface Summary
IOutputSegment Defines the interface for an output segment, which is used in an OutputDocument to replace segments of the source document with other text.
 

Class Summary
Attribute Represents a single attribute name/value segment within a StartTag.
Attributes Represents the list of Attribute objects present within a particular StartTag.
AttributesOutputSegment Implements an IOutputSegment whose content is a list of attribute name/value pairs.
BlankOutputSegment Implements an IOutputSegment whose content is a string of spaces with the same length as the segment.
CharacterEntityReference Represents an HTML Character Entity Reference.
CharacterReference Represents either a CharacterEntityReference or NumericCharacterReference.
CharOutputSegment Implements an IOutputSegment whose content is a character constant.
Element Represents an HTML element, which encompasses the StartTag, an optional EndTag and all content in between.
EndTag Represents the end tag of an Element.
FormControl form controls
FormControlOutputStyle *************
FormControlOutputStyle.DisplayValueConfig ************* must not be null
FormControlType Represents one of the HTML control types in a form which have the potential to be successful.
FormField Represents a field in an HTML form, a field being defined as the combination of all form controls having the same name.
FormFields Represents a collection of FormField objects.
NumericCharacterReference Represents an HTML Numeric Character Reference.
OutputDocument Represents a modified version of an original source text.
Segment Represents a segment of a Source document.
Source Represents a source HTML document.
StartTag Represents the start tag of an Element.
StringOutputSegment Implements an IOutputSegment whose content is a CharSequence.
Tag Represents either a StartTag or EndTag.
Util This class contains miscellaneous utility methods not directly associated with the HTML Parser library.
 

Exception Summary
OverlappingOutputSegmentsException Signals that overlapping output segments have been detected in the OutputDocument.
 

Package au.id.jericho.lib.html Description

A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.

The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.

For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/

For a summary of features and comparison with some other java HTML parsers, visit the homepage at http://jerichohtml.sourceforge.net

Modifying an HTML Document

The typical method for modifying a document is as follows. See the description of the OutputDocument class for sample code.

  1. Create a Source object from the source text
  2. Find the required segments by calling methods on the Source object and other segments
  3. Create an OutputDocument object from the source text
  4. Add an IOutputSegment to the OutputDocument for each segment of the document that is to be replaced with other text
  5. Call the OutputDocument.toString() method to get the final output

Analysing or Extracting Information from an HTML Document

If the document only needs to be analysed instead of modified, only the first two steps listed above are required. See the description of the FormFields class for sample code.







© 2015 - 2024 Weber Informatics LLC | Privacy Policy