All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.adobe.agl.lang.UCharacter Maven / Gradle / Ivy

The newest version!
//##header J2SE15
/**
*******************************************************************************
* Copyright (C) 1996-2008, International Business Machines Corporation and    *
* others. All Rights Reserved.                                                *
*******************************************************************************
*/
/*
 * File: UCharacter.java
 * ************************************************************************
 *
 * ADOBE CONFIDENTIAL
 * ___________________
 *
 *  Copyright 2012 Adobe Systems Incorporated
 *  All Rights Reserved.
 *
 * NOTICE:  All information contained herein is, and remains
 * the property of Adobe Systems Incorporated and its suppliers,
 * if any.  The intellectual and technical concepts contained
 * herein are proprietary to Adobe Systems Incorporated and its
 * suppliers and are protected by trade secret or copyright law.
 * Dissemination of this information or reproduction of this material
 * is strictly forbidden unless prior written permission is obtained
 * from Adobe Systems Incorporated.
 **************************************************************************/
package com.adobe.agl.lang;

import com.adobe.agl.lang.UCharacterEnums.ECharacterCategory;
import com.adobe.agl.lang.UCharacterEnums.ECharacterDirection;
import com.adobe.agl.text.UTF16;

/**
 * 

* The UCharacter class provides extensions to the * * java.lang.Character class. These extensions provide support for * more Unicode properties and together with the UTF16 * class, provide support for supplementary characters (those with code * points above U+FFFF). * Each ICU release supports the latest version of Unicode available at that time. *

*

* Code points are represented in these API using ints. While it would be * more convenient in Java to have a separate primitive datatype for them, * ints suffice in the meantime. *

*

* To use this class please add the jar file name icu4j.jar to the * class path, since it contains data files which supply the information used * by this file.
* E.g. In Windows
* set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
* Otherwise, another method would be to copy the files uprops.dat and * unames.icu from the icu4j source subdirectory * $ICU4J_SRC/src/com.adobe.agl.impl.data to your class directory * $ICU4J_CLASS/com.adobe.agl.impl.data. *

*

* Aside from the additions for UTF-16 support, and the updated Unicode * properties, the main differences between UCharacter and Character are: *

    *
  • UCharacter is not designed to be a char wrapper and does not have * APIs to which involves management of that single char.
    * These include: *
      *
    • char charValue(), *
    • int compareTo(java.lang.Character, java.lang.Character), etc. *
    *
  • UCharacter does not include Character APIs that are deprecated, nor * does it include the Java-specific character information, such as * boolean isJavaIdentifierPart(char ch). *
  • Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric * values '10' - '35'. UCharacter also does this in digit and * getNumericValue, to adhere to the java semantics of these * methods. New methods unicodeDigit, and * getUnicodeNumericValue do not treat the above code points * as having numeric values. This is a semantic change from ICU4J 1.3.1. *
*

* Further detail differences can be determined from the program * * com.adobe.agl.dev.test.lang.UCharacterCompare *

*

* In addition to Java compatibility functions, which calculate derived properties, * this API provides low-level access to the Unicode Character Database. *

*

* Unicode assigns each code point (not just assigned character) values for * many properties. * Most of them are simple boolean flags, or constants from a small enumerated list. * For some properties, values are strings or other relatively more complex types. *

*

* For more information see * "About the Unicode Character Database" (http://www.unicode.org/ucd/) * and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html). *

*

* There are also functions that provide easy migration from C/POSIX functions * like isblank(). Their use is generally discouraged because the C/POSIX * standards do not define their semantics beyond the ASCII range, which means * that different implementations exhibit very different behavior. * Instead, Unicode properties should be used directly. *

*

* There are also only a few, broad C/POSIX character classes, and they tend * to be used for conflicting purposes. For example, the "isalpha()" class * is sometimes used to determine word boundaries, while a more sophisticated * approach would at least distinguish initial letters from continuation * characters (the latter including combining marks). * (In ICU, BreakIterator is the most sophisticated API for word boundaries.) * Another example: There is no "istitle()" class for titlecase characters. *

*

* ICU 3.4 and later provides API access for all twelve C/POSIX character classes. * ICU implements them according to the Standard Recommendations in * Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions * (http://www.unicode.org/reports/tr18/#Compatibility_Properties). *

*

* API access for C/POSIX character classes is as follows: * - alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC) * - lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE) * - upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE) * - punct: ((1< *

* The C/POSIX character classes are also available in UnicodeSet patterns, * using patterns like [:graph:] or \p{graph}. *

*

* Note: There are several ICU (and Java) whitespace functions. * Comparison: * - isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property; * most of general categories "Z" (separators) + most whitespace ISO controls * (including no-break spaces, but excluding IS1..IS4 and ZWSP) * - isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces * - isSpaceChar: just Z (including no-break spaces) *

*

* This class is not subclassable *

* @author Syn Wee Quek * @stable ICU 2.1 * @see com.adobe.agl.lang.UCharacterEnums */ public final class UCharacter implements ECharacterCategory, ECharacterDirection { /** * The lowest Unicode code point value. * @stable ICU 2.1 */ public static final int MIN_VALUE = UTF16.CODEPOINT_MIN_VALUE; /** * The highest Unicode code point value (scalar value) according to the * Unicode Standard. * This is a 21-bit value (21 bits, rounded up).
* Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE * @stable ICU 2.1 */ public static final int MAX_VALUE = UTF16.CODEPOINT_MAX_VALUE; public static int getMirror(int ch) { // return gBdp.getMirror(ch); return ch; } }




© 2015 - 2024 Weber Informatics LLC | Privacy Policy