All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.ibm.icu.text.UForwardCharacterIterator Maven / Gradle / Ivy

Go to download

International Component for Unicode for Java (ICU4J) is a mature, widely used Java library providing Unicode and Globalization support

There is a newer version: 76.1
Show newest version
/*
 *******************************************************************************
 * Copyright (C) 1996-2004, International Business Machines Corporation and    *
 * others. All Rights Reserved.                                                *
 *******************************************************************************
 */
package com.ibm.icu.text;

/**
 * Interface that defines an API for forward-only iteration
 * on text objects.
 * This is a minimal interface for iteration without random access
 * or backwards iteration. It is especially useful for wrapping
 * streams with converters into an object for collation or
 * normalization.
 *
 * 

Characters can be accessed in two ways: as code units or as * code points. * Unicode code points are 21-bit integers and are the scalar values * of Unicode characters. ICU uses the type int for them. * Unicode code units are the storage units of a given * Unicode/UCS Transformation Format (a character encoding scheme). * With UTF-16, all code points can be represented with either one * or two code units ("surrogates"). * String storage is typically based on code units, while properties * of characters are typically determined using code point values. * Some processes may be designed to work with sequences of code units, * or it may be known that all characters that are important to an * algorithm can be represented with single code units. * Other processes will need to use the code point access functions.

* *

ForwardCharacterIterator provides next() to access * a code unit and advance an internal position into the text object, * similar to a return text[position++].
* It provides nextCodePoint() to access a code point and advance an internal * position.

* *

nextCodePoint() assumes that the current position is that of * the beginning of a code point, i.e., of its first code unit. * After nextCodePoint(), this will be true again. * In general, access to code units and code points in the same * iteration loop should not be mixed. In UTF-16, if the current position * is on a second code unit (Low Surrogate), then only that code unit * is returned even by nextCodePoint().

* * Usage: * * public void function1(UForwardCharacterIterator it) { * int c; * while((c=it.next())!=UForwardCharacterIterator.DONE) { * // use c * } * } * *

* @stable ICU 2.4 * */ public interface UForwardCharacterIterator { /** * Indicator that we have reached the ends of the UTF16 text. * @stable ICU 2.4 */ public static final int DONE = -1; /** * Returns the UTF16 code unit at index, and increments to the next * code unit (post-increment semantics). If index is out of * range, DONE is returned, and the iterator is reset to the limit * of the text. * @return the next UTF16 code unit, or DONE if the index is at the limit * of the text. * @stable ICU 2.4 */ public int next(); /** * Returns the code point at index, and increments to the next code * point (post-increment semantics). If index does not point to a * valid surrogate pair, the behavior is the same as * next(). Otherwise the iterator is incremented past * the surrogate pair, and the code point represented by the pair * is returned. * @return the next codepoint in text, or DONE if the index is at * the limit of the text. * @stable ICU 2.4 */ public int nextCodePoint(); }