com.adobe.agl.lang.UCharacter Maven / Gradle / Ivy
Show all versions of aem-sdk-api Show documentation
//##header J2SE15
/**
*******************************************************************************
* Copyright (C) 1996-2008, International Business Machines Corporation and *
* others. All Rights Reserved. *
*******************************************************************************
*/
/*
* File: UCharacter.java
* ************************************************************************
*
* ADOBE CONFIDENTIAL
* ___________________
*
* Copyright 2012 Adobe Systems Incorporated
* All Rights Reserved.
*
* NOTICE: All information contained herein is, and remains
* the property of Adobe Systems Incorporated and its suppliers,
* if any. The intellectual and technical concepts contained
* herein are proprietary to Adobe Systems Incorporated and its
* suppliers and are protected by trade secret or copyright law.
* Dissemination of this information or reproduction of this material
* is strictly forbidden unless prior written permission is obtained
* from Adobe Systems Incorporated.
**************************************************************************/
package com.adobe.agl.lang;
import com.adobe.agl.lang.UCharacterEnums.ECharacterCategory;
import com.adobe.agl.lang.UCharacterEnums.ECharacterDirection;
import com.adobe.agl.text.UTF16;
/**
*
* The UCharacter class provides extensions to the
*
* java.lang.Character class. These extensions provide support for
* more Unicode properties and together with the UTF16
* class, provide support for supplementary characters (those with code
* points above U+FFFF).
* Each ICU release supports the latest version of Unicode available at that time.
*
*
* Code points are represented in these API using ints. While it would be
* more convenient in Java to have a separate primitive datatype for them,
* ints suffice in the meantime.
*
*
* To use this class please add the jar file name icu4j.jar to the
* class path, since it contains data files which supply the information used
* by this file.
* E.g. In Windows
* set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
.
* Otherwise, another method would be to copy the files uprops.dat and
* unames.icu from the icu4j source subdirectory
* $ICU4J_SRC/src/com.adobe.agl.impl.data to your class directory
* $ICU4J_CLASS/com.adobe.agl.impl.data.
*
*
* Aside from the additions for UTF-16 support, and the updated Unicode
* properties, the main differences between UCharacter and Character are:
*
* - UCharacter is not designed to be a char wrapper and does not have
* APIs to which involves management of that single char.
* These include:
*
* - char charValue(),
*
- int compareTo(java.lang.Character, java.lang.Character), etc.
*
* - UCharacter does not include Character APIs that are deprecated, nor
* does it include the Java-specific character information, such as
* boolean isJavaIdentifierPart(char ch).
*
- Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric
* values '10' - '35'. UCharacter also does this in digit and
* getNumericValue, to adhere to the java semantics of these
* methods. New methods unicodeDigit, and
* getUnicodeNumericValue do not treat the above code points
* as having numeric values. This is a semantic change from ICU4J 1.3.1.
*
*
* Further detail differences can be determined from the program
*
* com.adobe.agl.dev.test.lang.UCharacterCompare
*
*
* In addition to Java compatibility functions, which calculate derived properties,
* this API provides low-level access to the Unicode Character Database.
*
*
* Unicode assigns each code point (not just assigned character) values for
* many properties.
* Most of them are simple boolean flags, or constants from a small enumerated list.
* For some properties, values are strings or other relatively more complex types.
*
*
* For more information see
* "About the Unicode Character Database" (http://www.unicode.org/ucd/)
* and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html).
*
*
* There are also functions that provide easy migration from C/POSIX functions
* like isblank(). Their use is generally discouraged because the C/POSIX
* standards do not define their semantics beyond the ASCII range, which means
* that different implementations exhibit very different behavior.
* Instead, Unicode properties should be used directly.
*
*
* There are also only a few, broad C/POSIX character classes, and they tend
* to be used for conflicting purposes. For example, the "isalpha()" class
* is sometimes used to determine word boundaries, while a more sophisticated
* approach would at least distinguish initial letters from continuation
* characters (the latter including combining marks).
* (In ICU, BreakIterator is the most sophisticated API for word boundaries.)
* Another example: There is no "istitle()" class for titlecase characters.
*
*
* ICU 3.4 and later provides API access for all twelve C/POSIX character classes.
* ICU implements them according to the Standard Recommendations in
* Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
* (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
*
*
* API access for C/POSIX character classes is as follows:
* - alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
* - lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
* - upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
* - punct: ((1<
*
* The C/POSIX character classes are also available in UnicodeSet patterns,
* using patterns like [:graph:] or \p{graph}.
*
*
* Note: There are several ICU (and Java) whitespace functions.
* Comparison:
* - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
* most of general categories "Z" (separators) + most whitespace ISO controls
* (including no-break spaces, but excluding IS1..IS4 and ZWSP)
* - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
* - isSpaceChar: just Z (including no-break spaces)
*
*
* This class is not subclassable
*
* @author Syn Wee Quek
* @stable ICU 2.1
* @see com.adobe.agl.lang.UCharacterEnums
*/
public final class UCharacter implements ECharacterCategory, ECharacterDirection
{
/**
* The lowest Unicode code point value.
* @stable ICU 2.1
*/
public static final int MIN_VALUE = UTF16.CODEPOINT_MIN_VALUE;
/**
* The highest Unicode code point value (scalar value) according to the
* Unicode Standard.
* This is a 21-bit value (21 bits, rounded up).
* Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
* @stable ICU 2.1
*/
public static final int MAX_VALUE = UTF16.CODEPOINT_MAX_VALUE;
public static int getMirror(int ch)
{
// return gBdp.getMirror(ch);
return ch;
}
}