doc.api.au.id.jericho.lib.html.package-summary.html Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of jericho-html

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

There is a newer version: 2.3

Show newest version







au.id.jericho.lib.html (Jericho HTML Parser 1.5-dev1)





















  
    Package 
      Class 
      Tree 
      Deprecated 
      Index 
      Help 
  









 PREV PACKAGE 
 NEXT PACKAGE

  FRAMES   
 NO FRAMES   
 










Package au.id.jericho.lib.html

A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.

See:


          Description





Interface Summary


IOutputSegment
Defines the interface for an output segment, which is used in an OutputDocument to
 replace segments of the source document with other text.


 






Class Summary


Attribute
Represents a single attribute
 name/value segment within a StartTag.


Attributes
Represents the list of Attribute objects present within a particular StartTag.


AttributesOutputSegment
Implements an IOutputSegment whose content is a list of attribute name/value pairs.


BlankOutputSegment
Implements an IOutputSegment whose content is a string of spaces with the same length as the segment.


CharacterEntityReference
Represents an HTML Character Entity Reference.


CharacterReference
Represents either a CharacterEntityReference or NumericCharacterReference.


CharOutputSegment
Implements an IOutputSegment whose content is a character constant.


Element
Represents an HTML element,
 which encompasses the StartTag, an optional EndTag and all content in between.


EndTag
Represents the end tag of an Element.


FormControl
form controls


FormControlOutputStyle
*************


FormControlOutputStyle.DisplayValueConfig
*************
 must not be null


FormControlType
Represents one of the HTML control types in a form
 which have the potential to be successful.


FormField
Represents a field in an HTML form,
 a field being defined as the combination of all form controls
 having the same name.


FormFields
Represents a collection of FormField objects.


NumericCharacterReference
Represents an HTML Numeric Character Reference.


OutputDocument
Represents a modified version of an original source text.


Segment
Represents a segment of a Source document.


Source
Represents a source HTML document.


StartTag
Represents the start tag of an Element.


StringOutputSegment
Implements an IOutputSegment whose content is a CharSequence.


Tag
Represents either a StartTag or EndTag.


Util
This class contains miscellaneous utility methods not directly associated with the HTML Parser library.


 






Exception Summary


OverlappingOutputSegmentsException
Signals that overlapping output segments have been detected in the OutputDocument.


 



Package au.id.jericho.lib.html Description



A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.
		
			The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL).
			You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
		
		
			For downloads, support and updates visit the SourceForge.net project page at
			http://sourceforge.net/projects/jerichohtml/
		
		
			For a summary of features and comparison with some other java HTML parsers, visit the homepage at
			http://jerichohtml.sourceforge.net
		
		Modifying an HTML Document
		
			The typical method for modifying a document is as follows.
			See the description of the OutputDocument class for sample code.
		
		
			Create a Source object from the source text
			Find the required segments by calling methods on the Source object and other segments
			Create an OutputDocument object from the source text
			Add an IOutputSegment to the OutputDocument for each segment of the document that is to be replaced with other text
			Call the OutputDocument.toString() method to get the final output
		
		Analysing or Extracting Information from an HTML Document
		
			If the document only needs to be analysed instead of modified, only the first two steps listed above are required.
			See the description of the FormFields class for sample code.
		
		














  
    Package 
      Class 
      Tree 
      Deprecated 
      Index 
      Help 
  









 PREV PACKAGE 
 NEXT PACKAGE

  FRAMES   
 NO FRAMES

Class Summary
Attribute	Represents a single attribute name/value segment within a `StartTag`.
Attributes	Represents the list of `Attribute` objects present within a particular `StartTag`.
AttributesOutputSegment	Implements an `IOutputSegment` whose content is a list of attribute name/value pairs.
BlankOutputSegment	Implements an `IOutputSegment` whose content is a string of spaces with the same length as the segment.
CharacterEntityReference	Represents an HTML Character Entity Reference.
CharacterReference	Represents either a `CharacterEntityReference` or `NumericCharacterReference`.
CharOutputSegment	Implements an `IOutputSegment` whose content is a character constant.
Element	Represents an HTML element, which encompasses the `StartTag`, an optional `EndTag` and all content in between.
EndTag	Represents the end tag of an `Element`.
FormControl	form controls
FormControlOutputStyle	*************
FormControlOutputStyle.DisplayValueConfig	************* must not be null
FormControlType	Represents one of the HTML control types in a form which have the potential to be successful.
FormField	Represents a field in an HTML form, a field being defined as the combination of all form controls having the same name.
FormFields	Represents a collection of `FormField` objects.
NumericCharacterReference	Represents an HTML Numeric Character Reference.
OutputDocument	Represents a modified version of an original source text.
Segment	Represents a segment of a `Source` document.
Source	Represents a source HTML document.
StartTag	Represents the start tag of an `Element`.
StringOutputSegment	Implements an `IOutputSegment` whose content is a `CharSequence`.
Tag	Represents either a `StartTag` or `EndTag`.
Util	This class contains miscellaneous utility methods not directly associated with the HTML Parser library.

Interface Summary
IOutputSegment	Defines the interface for an output segment, which is used in an `OutputDocument` to replace segments of the source document with other text.

Exception Summary
OverlappingOutputSegmentsException	Signals that overlapping output segments have been detected in the `OutputDocument`.