All Downloads are FREE. Search and download functionalities are using the official Maven repository.

doc.api.au.id.jericho.lib.html.NumericCharacterReference.html Maven / Gradle / Ivy

Go to download

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

There is a newer version: 2.3
Show newest version






NumericCharacterReference (Jericho HTML Parser 1.5-dev1)

















au.id.jericho.lib.html
Class NumericCharacterReference

java.lang.Object
  extended byau.id.jericho.lib.html.Segment
      extended byau.id.jericho.lib.html.CharacterReference
          extended byau.id.jericho.lib.html.NumericCharacterReference
All Implemented Interfaces:
java.lang.CharSequence, java.lang.Comparable

public class NumericCharacterReference
extends CharacterReference

Represents an HTML Numeric Character Reference.

Static methods to encode and decode strings and single characters can be found in the CharacterReference superclass.

NumericCharacterReference objects are created using one of the following methods:

See Also:
CharacterReference

Field Summary
 
Fields inherited from class au.id.jericho.lib.html.CharacterReference
ApostropheEncoded, INVALID_CODE_POINT
 
Method Summary
static java.lang.String encode(java.lang.CharSequence unencodedText)
          Encodes the specified text, escaping special characters into numeric character references.
static java.lang.String encodeDecimal(java.lang.CharSequence unencodedText)
          Encodes the specified text, escaping special characters into decimal numeric character references.
static java.lang.String encodeHexadecimal(java.lang.CharSequence unencodedText)
          Encodes the specified text, escaping special characters into hexadecimal numeric character references.
 java.lang.String getCharacterReferenceString()
          Returns the correct encoded form of this numeric character reference.
static java.lang.String getCharacterReferenceString(int codePoint)
          Returns the numeric character reference encoded form of the specified Unicode code point.
 java.lang.String getDebugInfo()
          Returns a string representation of this object useful for debugging purposes.
 boolean isDecimal()
          Indicates whether this numeric character reference is in decimal format.
 boolean isHexadecimal()
          Indicates whether this numeric character reference is in hexadecimal format.
 
Methods inherited from class au.id.jericho.lib.html.CharacterReference
decode, decodeCollapseWhiteSpace, encodeWithWhiteSpaceFormatting, getChar, getCodePoint, getCodePointFromCharacterReferenceString, getDecimalCharacterReferenceString, getDecimalCharacterReferenceString, getHexadecimalCharacterReferenceString, getHexadecimalCharacterReferenceString, getUnicodeText, getUnicodeText, parse, reencode, requiresEncoding
 
Methods inherited from class au.id.jericho.lib.html.Segment
charAt, compareTo, encloses, encloses, equals, findAllCharacterReferences, findAllComments, findAllElements, findAllElements, findAllStartTags, findAllStartTags, findAllStartTags, findFormControls, findFormFields, findWords, getBegin, getEnd, getSourceText, getSourceTextNoWhitespace, hashCode, ignoreWhenParsing, isComment, isWhiteSpace, length, parseAttributes, subSequence, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Method Detail

isDecimal

public boolean isDecimal()
Indicates whether this numeric character reference is in decimal format. (eg ">")

This flag is set depending on whether character reference in the source document was in decimal or hexadecimal format.

Returns:
true if this numeric character reference is in decimal format, otherwise false.

isHexadecimal

public boolean isHexadecimal()
Indicates whether this numeric character reference is in hexadecimal format. (eg ">")

This flag is set depending on whether character reference in the source document was in hexadecimal or decimal format.

Returns:
true if this numeric character reference is in hexadecimal format, otherwise false.

encode

public static java.lang.String encode(java.lang.CharSequence unencodedText)
Encodes the specified text, escaping special characters into numeric character references.

Each character is encoded only if the requiresEncoding(char) method would return true for that character.

This method encodes all character references in decimal format, and is exactly the same as calling encodeDecimal(CharSequence).

To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence) method instead.

To encode text using hexadecimal numeric character references only, use the encodeHexadecimal(CharSequence) method instead.

Parameters:
unencodedText - the text to encode.
Returns:
the encoded string.
See Also:
CharacterReference.decode(CharSequence encodedText)

encodeDecimal

public static java.lang.String encodeDecimal(java.lang.CharSequence unencodedText)
Encodes the specified text, escaping special characters into decimal numeric character references.

Each character is encoded only if the requiresEncoding(char) method would return true for that character.

To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence) method instead.

To encode text using hexadecimal numeric character references only, use the encodeHexadecimal(CharSequence) method instead.

Parameters:
unencodedText - the text to encode.
Returns:
the encoded string.
See Also:
CharacterReference.decode(CharSequence encodedText)

encodeHexadecimal

public static java.lang.String encodeHexadecimal(java.lang.CharSequence unencodedText)
Encodes the specified text, escaping special characters into hexadecimal numeric character references.

Each character is encoded only if the requiresEncoding(char) method would return true for that character.

To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence) method instead.

To encode text using decimal numeric character references only, use the encodeDecimal(CharSequence) method instead.

Parameters:
unencodedText - the text to encode.
Returns:
the encoded string.
See Also:
CharacterReference.decode(CharSequence encodedText)

getCharacterReferenceString

public java.lang.String getCharacterReferenceString()
Returns the correct encoded form of this numeric character reference.

The returned string will use the same radix as the original character reference in the source document, i.e. decimal format if isDecimal() is true, and hexadecimal format if isHexadecimal() is true.

Note that the returned string is not necessarily the same as the original source text used to create this object. This library will recognise certain invalid forms of character references, as detailed in the decode(CharSequence encodedText) method.

To retrieve the original source text, use the toString() method instead.

Example:
CharacterReference.parse("&#62").getCharacterReferenceString() returns ">"

Specified by:
getCharacterReferenceString in class CharacterReference
Returns:
the correct encoded form of this numeric character reference.
See Also:
CharacterReference.getCharacterReferenceString(int codePoint)

getCharacterReferenceString

public static java.lang.String getCharacterReferenceString(int codePoint)
Returns the numeric character reference encoded form of the specified Unicode code point.

This method returns the character reference in decimal format, and is exactly the same as calling CharacterReference.getDecimalCharacterReferenceString(int codePoint).

To get either the character entity reference or numeric character reference, use the
CharacterReference.getCharacterReferenceString(int codePoint) method instead.

To get the character reference in hexadecimal format, use the CharacterReference.getHexadecimalCharacterReferenceString(int codePoint) method instead.

Examples:
NumericCharacterReference.getCharacterReferenceString(62) returns ">"
NumericCharacterReference.getCharacterReferenceString('>') returns ">"

Returns:
the numeric character reference encoded form of the specified Unicode code point.
See Also:
CharacterReference.getCharacterReferenceString(int codePoint)

getDebugInfo

public java.lang.String getDebugInfo()
Description copied from class: Segment
Returns a string representation of this object useful for debugging purposes.

Overrides:
getDebugInfo in class Segment
Returns:
a string representation of this object useful for debugging purposes.






© 2015 - 2024 Weber Informatics LLC | Privacy Policy