All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.api.au.id.jericho.lib.html.Element.html Maven / Gradle / Ivy

Go to download

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

There is a newer version: 2.3
Show newest version






Element (Jericho HTML Parser 1.5-dev1)

















au.id.jericho.lib.html
Class Element

java.lang.Object
  extended byau.id.jericho.lib.html.Segment
      extended byau.id.jericho.lib.html.Element
All Implemented Interfaces:
java.lang.CharSequence, java.lang.Comparable

public final class Element
extends Segment

Represents an HTML element, which encompasses the StartTag, an optional EndTag and all content in between.

If the start tag has no corresponding end tag:

  • If the end tag is optional, the end of the element occurs at the start of the next tag that implicitly terminates this type of element.
  • If the end tag is forbidden, the element spans only the start tag.
  • If the end tag is required, the source HTML is invalid and the element spans only the start tag. No attempt is made by this library to determine how user agents might interpret invalid HTML.
Note that this behaviour has changed since version 1.0, which treated optional end tags the same as required end tags.

Created using the Segment.findAllElements(String name) or StartTag.getElement() method.

See also the XML 1.0 specification for elements.

See Also:
StartTag

Method Summary
 Attributes getAttributes()
          Returns the attributes specified in this element's start tag.
 Segment getContent()
          Returns the segment representing the content of the element.
 java.lang.String getContentText()
          Returns the content text of the element.
 java.lang.String getDebugInfo()
          Returns a string representation of this object useful for debugging purposes.
 EndTag getEndTag()
          Returns the end tag of the element.
 FormControl getFormControl()
          Returns the FormControl defined by this element.
 java.lang.String getName()
          Returns the name of the StartTag of this element.
 StartTag getStartTag()
          Returns the start tag of the element.
static boolean isBlock(java.lang.String name)
          Indicates whether an element with the given name is a block element according to the HTML 4.01 Transitional DTD.
 boolean isEmpty()
          Indicates whether the element is empty.
 boolean isEmptyElementTag()
          Indicates whether the element is an empty element tag.
static boolean isInline(java.lang.String name)
          Indicates whether an element with the given name is an inline element according to the HTML 4.01 Transitional DTD.
 
Methods inherited from class au.id.jericho.lib.html.Segment
charAt, compareTo, encloses, encloses, equals, findAllCharacterReferences, findAllComments, findAllElements, findAllElements, findAllStartTags, findAllStartTags, findAllStartTags, findFormControls, findFormFields, findWords, getBegin, getEnd, getSourceText, getSourceTextNoWhitespace, hashCode, ignoreWhenParsing, isComment, isWhiteSpace, length, parseAttributes, subSequence, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Method Detail

getContentText

public java.lang.String getContentText()
Returns the content text of the element.

Returns:
the content text of the element, or null if the element is empty.

getContent

public Segment getContent()
Returns the segment representing the content of the element.

This segment spans between the end of the start tag and the start of the end tag. If the end tag is not present, the content reaches to the end of the element.

The returned segment is newly created with every call to this method.

Note that before version 1.5 this method returned null if the element was empty, whereas now a zero-length segment is returned.

Returns:
the segment representing the content of the element, guaranteed not null.

getStartTag

public StartTag getStartTag()
Returns the start tag of the element.

Returns:
the start tag of the element.

getEndTag

public EndTag getEndTag()
Returns the end tag of the element.

If the element has no end tag this method returns null.

Returns:
the end tag of the element, or null if the element has no end tag.

getName

public java.lang.String getName()
Returns the name of the StartTag of this element.

Returns:
the name of the StartTag of this element.

isEmpty

public boolean isEmpty()
Indicates whether the element is empty.

The representation of an empty element is either a start tag immediately followed by an end tag, or an empty-element tag.

Returns:
true if the element is empty, otherwise false.

isEmptyElementTag

public boolean isEmptyElementTag()
Indicates whether the element is an empty element tag. This is signified by the characters "/>" at the end of the start tag and the absence of an end tag. Note that not every empty element is an empty element tag.

Returns:
true if the element is an empty element tag, otherwise false.
See Also:
isEmpty()

isBlock

public static boolean isBlock(java.lang.String name)
Indicates whether an element with the given name is a block element according to the HTML 4.01 Transitional DTD.

A brief description of the difference between block and inline elements is given in the HTML 4.01 Specification section 7.5.3.

Returns:
true if an element with the given name is a block element, otherwise false.

isInline

public static boolean isInline(java.lang.String name)
Indicates whether an element with the given name is an inline element according to the HTML 4.01 Transitional DTD.

A brief description of the difference between block and inline elements is given in the HTML 4.01 Specification section 7.5.3.

Returns:
true if an element with the given name is an inline element, otherwise false.

getAttributes

public Attributes getAttributes()
Returns the attributes specified in this element's start tag.

This is equivalent to getStartTag().getAttributes()

Returns:
the attributes specified in this element's start tag.
See Also:
StartTag.getAttributes()

getFormControl

public FormControl getFormControl()
Returns the FormControl defined by this element.

Returns:
the FormControl defined by this element, or null if it is not a control.

getDebugInfo

public java.lang.String getDebugInfo()
Description copied from class: Segment
Returns a string representation of this object useful for debugging purposes.

Overrides:
getDebugInfo in class Segment
Returns:
a string representation of this object useful for debugging purposes.






© 2015 - 2024 Weber Informatics LLC | Privacy Policy