com.adobe.agl.lang.UCharacter Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of aem-sdk-api Show documentation
The Adobe Experience Manager SDK
The newest version!
//##header J2SE15
/**
*******************************************************************************
* Copyright (C) 1996-2008, International Business Machines Corporation and    *
* others. All Rights Reserved.                                                *
*******************************************************************************
*/
/*
 * File: UCharacter.java
 * ************************************************************************
 *
 * ADOBE CONFIDENTIAL
 * ___________________
 *
 *  Copyright 2012 Adobe Systems Incorporated
 *  All Rights Reserved.
 *
 * NOTICE:  All information contained herein is, and remains
 * the property of Adobe Systems Incorporated and its suppliers,
 * if any.  The intellectual and technical concepts contained
 * herein are proprietary to Adobe Systems Incorporated and its
 * suppliers and are protected by trade secret or copyright law.
 * Dissemination of this information or reproduction of this material
 * is strictly forbidden unless prior written permission is obtained
 * from Adobe Systems Incorporated.
 **************************************************************************/
package com.adobe.agl.lang;

import com.adobe.agl.lang.UCharacterEnums.ECharacterCategory;
import com.adobe.agl.lang.UCharacterEnums.ECharacterDirection;
import com.adobe.agl.text.UTF16;

/**
 * 
 * The UCharacter class provides extensions to the 
 * 
 * java.lang.Character class. These extensions provide support for 
 * more Unicode properties and together with the UTF16 
 * class, provide support for supplementary characters (those with code 
 * points above U+FFFF).
 * Each ICU release supports the latest version of Unicode available at that time.
 * 
 * 
 * Code points are represented in these API using ints. While it would be 
 * more convenient in Java to have a separate primitive datatype for them, 
 * ints suffice in the meantime.
 * 
 * 
 * To use this class please add the jar file name icu4j.jar to the 
 * class path, since it contains data files which supply the information used 
 * by this file.

 * E.g. In Windows 

 * set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.

 * Otherwise, another method would be to copy the files uprops.dat and 
 * unames.icu from the icu4j source subdirectory
 * $ICU4J_SRC/src/com.adobe.agl.impl.data to your class directory 
 * $ICU4J_CLASS/com.adobe.agl.impl.data.
 * 
 * 
 * Aside from the additions for UTF-16 support, and the updated Unicode
 * properties, the main differences between UCharacter and Character are:
 * 

 *  UCharacter is not designed to be a char wrapper and does not have 
 *      APIs to which involves management of that single char.

 *      These include: 
 *      
 *         char charValue(), 
 *        
 int compareTo(java.lang.Character, java.lang.Character), etc.
 *      
 * 
 UCharacter does not include Character APIs that are deprecated, nor
 *      does it include the Java-specific character information, such as 
 *      boolean isJavaIdentifierPart(char ch).
 * 
 Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric 
 *      values '10' - '35'. UCharacter also does this in digit and
 *      getNumericValue, to adhere to the java semantics of these
 *      methods.  New methods unicodeDigit, and
 *      getUnicodeNumericValue do not treat the above code points 
 *      as having numeric values.  This is a semantic change from ICU4J 1.3.1.
 * 
 * 
 * Further detail differences can be determined from the program 
 *        
 *        com.adobe.agl.dev.test.lang.UCharacterCompare
 * 
 * 
 * In addition to Java compatibility functions, which calculate derived properties,
 * this API provides low-level access to the Unicode Character Database.
 * 
 * 
 * Unicode assigns each code point (not just assigned character) values for
 * many properties.
 * Most of them are simple boolean flags, or constants from a small enumerated list.
 * For some properties, values are strings or other relatively more complex types.
 * 
 * 
 * For more information see
 * "About the Unicode Character Database" (http://www.unicode.org/ucd/)
 * and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html).
 * 
 * 
 * There are also functions that provide easy migration from C/POSIX functions
 * like isblank(). Their use is generally discouraged because the C/POSIX
 * standards do not define their semantics beyond the ASCII range, which means
 * that different implementations exhibit very different behavior.
 * Instead, Unicode properties should be used directly.
 * 
 * 
 * There are also only a few, broad C/POSIX character classes, and they tend
 * to be used for conflicting purposes. For example, the "isalpha()" class
 * is sometimes used to determine word boundaries, while a more sophisticated
 * approach would at least distinguish initial letters from continuation
 * characters (the latter including combining marks).
 * (In ICU, BreakIterator is the most sophisticated API for word boundaries.)
 * Another example: There is no "istitle()" class for titlecase characters.
 * 
 * 
 * ICU 3.4 and later provides API access for all twelve C/POSIX character classes.
 * ICU implements them according to the Standard Recommendations in
 * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions
 * (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
 * 
 * 
 * API access for C/POSIX character classes is as follows:
 * - alpha:     isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
 * - lower:     isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
 * - upper:     isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
 * - punct:     ((1<
 * 

 * The C/POSIX character classes are also available in UnicodeSet patterns,
 * using patterns like [:graph:] or \p{graph}.
 * 
 * 
 * Note: There are several ICU (and Java) whitespace functions.
 * Comparison:
 * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
 *       most of general categories "Z" (separators) + most whitespace ISO controls
 *       (including no-break spaces, but excluding IS1..IS4 and ZWSP)
 * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
 * - isSpaceChar: just Z (including no-break spaces)
 * 
 * 
 * This class is not subclassable
 * 
 * @author Syn Wee Quek
 * @stable ICU 2.1
 * @see com.adobe.agl.lang.UCharacterEnums
 */

public final class UCharacter implements ECharacterCategory, ECharacterDirection
{
	
    /** 
     * The lowest Unicode code point value.
     * @stable ICU 2.1
     */
    public static final int MIN_VALUE = UTF16.CODEPOINT_MIN_VALUE;

    /**
     * The highest Unicode code point value (scalar value) according to the 
     * Unicode Standard. 
     * This is a 21-bit value (21 bits, rounded up).

     * Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
     * @stable ICU 2.1
     */
    public static final int MAX_VALUE = UTF16.CODEPOINT_MAX_VALUE; 
	
	public static int getMirror(int ch)
    {
//        return gBdp.getMirror(ch);
		return ch;
    }

}