com.ibm.icu.util.StringTokenizer Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of icu4j Show documentation

International Component for Unicode for Java (ICU4J) is a mature, widely used Java library providing Unicode and Globalization support

There is a newer version: 76.1

Show newest version

/**
*******************************************************************************
* Copyright (C) 1996-2011, International Business Machines Corporation and    *
* others. All Rights Reserved.                                                *
*******************************************************************************
*/

package com.ibm.icu.util;

import java.util.Enumeration;
import java.util.NoSuchElementException;

import com.ibm.icu.text.UTF16;
import com.ibm.icu.text.UnicodeSet;

/**
 * {@icuenhanced java.util.Calendar}.{@icu _usage_}
 *
 * The string tokenizer class allows an application to break a string 
 * into tokens by performing code point comparison. 
 * The StringTokenizer methods do not distinguish 
 * among identifiers, numbers, and quoted strings, nor do they recognize 
 * and skip comments.
 * 
 * The set of delimiters (the codepoints that separate tokens) may be 
 * specified either at creation time or on a per-token basis. 
 * 
 * 
 * An instance of StringTokenizer behaves in one of three ways, 
 * depending on whether it was created with the returnDelims 
 * and coalesceDelims
 * flags having the value true or false: 
 * 

 * If returnDelims is false, delimiter code points serve to 
 * separate tokens. A token is a maximal sequence of consecutive 
 * code points that are not delimiters. 
 * 
If returnDelims is true, delimiter code points are 
 * themselves considered to be tokens. In this case, if coalesceDelims is
 * true, such tokens will be the maximal sequence of consecutive
 * code points that are delimiters.  If coalesceDelims is false,
 * a token will be received for each delimiter code point.
 * 
 * A token is thus either one 
 * delimiter code point, a maximal sequence of consecutive code points that
 * are delimiters, or a maximal sequence of consecutive code 
 * points that are not delimiters.
 * 
 * 
 * A StringTokenizer object internally maintains a current 
 * position within the string to be tokenized. Some operations advance this 
 * current position past the code point processed.
 * 
 * 
 * A token is returned by taking a substring of the string that was used to 
 * create the StringTokenizer object.
 * 
 * 
 * Example of the use of the default delimiter tokenizer.
 * 
 * StringTokenizer st = new StringTokenizer("this is a test");
 * while (st.hasMoreTokens()) {
 *     println(st.nextToken());
 *     }
 * 
 * 
 * 
 * prints the following output:
 * 
 *     this
 *     is
 *     a
 *     test
 * 
 * 
 * 
 * Example of the use of the tokenizer with user specified delimiter.
 * 
 *     StringTokenizer st = new StringTokenizer(
 *     "this is a test with supplementary characters \ud800\ud800\udc00\udc00",
 *         " \ud800\udc00");
 *     while (st.hasMoreTokens()) {
 *         println(st.nextToken());
 *     }
 * 
 * 
 * 
 * prints the following output:
 * 
 *     this
 *     is
 *     a
 *     test
 *     with
 *     supplementary
 *     characters
 *     \ud800
 *     \udc00
 * 
 * 
 * @author syn wee
 * @stable ICU 2.4
 */
public final class StringTokenizer implements Enumeration