com.ibm.icu.text.UForwardCharacterIterator Maven / Gradle / Ivy
// © 2016 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html#License
/*
*******************************************************************************
* Copyright (C) 1996-2016, International Business Machines Corporation and *
* others. All Rights Reserved. *
*******************************************************************************
*/
package com.ibm.icu.text;
/**
* Interface that defines an API for forward-only iteration
* on text objects.
* This is a minimal interface for iteration without random access
* or backwards iteration. It is especially useful for wrapping
* streams with converters into an object for collation or
* normalization.
*
* Characters can be accessed in two ways: as code units or as
* code points.
* Unicode code points are 21-bit integers and are the scalar values
* of Unicode characters. ICU uses the type int
for them.
* Unicode code units are the storage units of a given
* Unicode/UCS Transformation Format (a character encoding scheme).
* With UTF-16, all code points can be represented with either one
* or two code units ("surrogates").
* String storage is typically based on code units, while properties
* of characters are typically determined using code point values.
* Some processes may be designed to work with sequences of code units,
* or it may be known that all characters that are important to an
* algorithm can be represented with single code units.
* Other processes will need to use the code point access functions.
*
*
ForwardCharacterIterator provides next() to access
* a code unit and advance an internal position into the text object,
* similar to a return text[position++]
.
* It provides nextCodePoint() to access a code point and advance an internal
* position.
*
*
nextCodePoint() assumes that the current position is that of
* the beginning of a code point, i.e., of its first code unit.
* After nextCodePoint(), this will be true again.
* In general, access to code units and code points in the same
* iteration loop should not be mixed. In UTF-16, if the current position
* is on a second code unit (Low Surrogate), then only that code unit
* is returned even by nextCodePoint().
*
* Usage:
*
* public void function1(UForwardCharacterIterator it) {
* int c;
* while((c=it.next())!=UForwardCharacterIterator.DONE) {
* // use c
* }
* }
*
* @stable ICU 2.4
*
*/
public interface UForwardCharacterIterator {
/**
* Indicator that we have reached the ends of the UTF16 text.
* @stable ICU 2.4
*/
public static final int DONE = -1;
/**
* Returns the UTF16 code unit at index, and increments to the next
* code unit (post-increment semantics). If index is out of
* range, DONE is returned, and the iterator is reset to the limit
* of the text.
* @return the next UTF16 code unit, or DONE if the index is at the limit
* of the text.
* @stable ICU 2.4
*/
public int next();
/**
* Returns the code point at index, and increments to the next code
* point (post-increment semantics). If index does not point to a
* valid surrogate pair, the behavior is the same as
* next()
. Otherwise the iterator is incremented past
* the surrogate pair, and the code point represented by the pair
* is returned.
* @return the next codepoint in text, or DONE if the index is at
* the limit of the text.
* @stable ICU 2.4
*/
public int nextCodePoint();
}