All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.ibm.icu.text.RuleBasedNumberFormat Maven / Gradle / Ivy

Go to download

International Component for Unicode for Java (ICU4J) is a mature, widely used Java library providing Unicode and Globalization support

There is a newer version: 76.1
Show newest version
// © 2016 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
/*
 *******************************************************************************
 * Copyright (C) 1996-2016, International Business Machines Corporation and
 * others. All Rights Reserved.
 *******************************************************************************
 */

package com.ibm.icu.text;

import java.math.BigInteger;
import java.text.FieldPosition;
import java.text.ParsePosition;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Locale;
import java.util.Map;
import java.util.MissingResourceException;
import java.util.Set;

import com.ibm.icu.impl.ICUData;
import com.ibm.icu.impl.ICUDebug;
import com.ibm.icu.impl.ICUResourceBundle;
import com.ibm.icu.impl.PatternProps;
import com.ibm.icu.lang.UCharacter;
import com.ibm.icu.math.BigDecimal;
import com.ibm.icu.util.ULocale;
import com.ibm.icu.util.ULocale.Category;
import com.ibm.icu.util.UResourceBundle;
import com.ibm.icu.util.UResourceBundleIterator;


/**
 * 

A class that formats numbers according to a set of rules. This number formatter is * typically used for spelling out numeric values in words (e.g., 25,3476 as * "twenty-five thousand three hundred seventy-six" or "vingt-cinq mille trois * cents soixante-seize" or * "funfundzwanzigtausenddreihundertsechsundsiebzig"), but can also be used for * other complicated formatting tasks, such as formatting a number of seconds as hours, * minutes and seconds (e.g., 3,730 as "1:02:10").

* *

The resources contain three predefined formatters for each locale: spellout, which * spells out a value in words (123 is "one hundred twenty-three"); ordinal, which * appends an ordinal suffix to the end of a numeral (123 is "123rd"); and * duration, which shows a duration in seconds as hours, minutes, and seconds (123 is * "2:03").  The client can also define more specialized RuleBasedNumberFormats * by supplying programmer-defined rule sets.

* *

The behavior of a RuleBasedNumberFormat is specified by a textual description * that is either passed to the constructor as a String or loaded from a resource * bundle. In its simplest form, the description consists of a semicolon-delimited list of rules. * Each rule has a string of output text and a value or range of values it is applicable to. * In a typical spellout rule set, the first twenty rules are the words for the numbers from * 0 to 19:

* *
zero; one; two; three; four; five; six; seven; eight; nine;
 * ten; eleven; twelve; thirteen; fourteen; fifteen; sixteen; seventeen; eighteen; nineteen;
* *

For larger numbers, we can use the preceding set of rules to format the ones place, and * we only have to supply the words for the multiples of 10:

* *
20: twenty[->>];
 * 30: thirty{->>];
 * 40: forty[->>];
 * 50: fifty[->>];
 * 60: sixty[->>];
 * 70: seventy[->>];
 * 80: eighty[->>];
 * 90: ninety[->>];
* *

In these rules, the base value is spelled out explicitly and set off from the * rule's output text with a colon. The rules are in a sorted list, and a rule is applicable * to all numbers from its own base value to one less than the next rule's base value. The * ">>" token is called a substitution and tells the formatter to * isolate the number's ones digit, format it using this same set of rules, and place the * result at the position of the ">>" token. Text in brackets is omitted if * the number being formatted is an even multiple of 10 (the hyphen is a literal hyphen; 24 * is "twenty-four," not "twenty four").

* *

For even larger numbers, we can actually look up several parts of the number in the * list:

* *
100: << hundred[ >>];
* *

The "<<" represents a new kind of substitution. The << isolates * the hundreds digit (and any digits to its left), formats it using this same rule set, and * places the result where the "<<" was. Notice also that the meaning of * >> has changed: it now refers to both the tens and the ones digits. The meaning of * both substitutions depends on the rule's base value. The base value determines the rule's divisor, * which is the highest power of 10 that is less than or equal to the base value (the user * can change this). To fill in the substitutions, the formatter divides the number being * formatted by the divisor. The integral quotient is used to fill in the << * substitution, and the remainder is used to fill in the >> substitution. The meaning * of the brackets changes similarly: text in brackets is omitted if the value being * formatted is an even multiple of the rule's divisor. The rules are applied recursively, so * if a substitution is filled in with text that includes another substitution, that * substitution is also filled in.

* *

This rule covers values up to 999, at which point we add another rule:

* *
1000: << thousand[ >>];
* *

Again, the meanings of the brackets and substitution tokens shift because the rule's * base value is a higher power of 10, changing the rule's divisor. This rule can actually be * used all the way up to 999,999. This allows us to finish out the rules as follows:

* *
1,000,000: << million[ >>];
 * 1,000,000,000: << billion[ >>];
 * 1,000,000,000,000: << trillion[ >>];
 * 1,000,000,000,000,000: OUT OF RANGE!;
* *

Commas, periods, and spaces can be used in the base values to improve legibility and * are ignored by the rule parser. The last rule in the list is customarily treated as an * "overflow rule," applying to everything from its base value on up, and often (as * in this example) being used to print out an error message or default representation. * Notice also that the size of the major groupings in large numbers is controlled by the * spacing of the rules: because in English we group numbers by thousand, the higher rules * are separated from each other by a factor of 1,000.

* *

To see how these rules actually work in practice, consider the following example: * Formatting 25,430 with this rule set would work like this:

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
<< thousand >>[the rule whose base value is 1,000 is applicable to 25,340]
twenty->> thousand >>[25,340 over 1,000 is 25. The rule for 20 applies.]
twenty-five thousand >>[25 mod 10 is 5. The rule for 5 is "five."
twenty-five thousand << hundred >>[25,340 mod 1,000 is 340. The rule for 100 applies.]
twenty-five thousand three hundred >>[340 over 100 is 3. The rule for 3 is "three."]
twenty-five thousand three hundred forty[340 mod 100 is 40. The rule for 40 applies. Since 40 divides * evenly by 10, the hyphen and substitution in the brackets are omitted.]
* *

The above syntax suffices only to format positive integers. To format negative numbers, * we add a special rule:

* *
-x: minus >>;
* *

This is called a negative-number rule, and is identified by "-x" * where the base value would be. This rule is used to format all negative numbers. the * >> token here means "find the number's absolute value, format it with these * rules, and put the result here."

* *

We also add a special rule called a fraction rule for numbers with fractional * parts:

* *
x.x: << point >>;
* *

This rule is used for all positive non-integers (negative non-integers pass through the * negative-number rule first and then through this rule). Here, the << token refers to * the number's integral part, and the >> to the number's fractional part. The * fractional part is formatted as a series of single-digit numbers (e.g., 123.456 would be * formatted as "one hundred twenty-three point four five six").

* *

To see how this rule syntax is applied to various languages, examine the resource data.

* *

There is actually much more flexibility built into the rule language than the * description above shows. A formatter may own multiple rule sets, which can be selected by * the caller, and which can use each other to fill in their substitutions. Substitutions can * also be filled in with digits, using a DecimalFormat object. There is syntax that can be * used to alter a rule's divisor in various ways. And there is provision for much more * flexible fraction handling. A complete description of the rule syntax follows:

* *
* *

The description of a RuleBasedNumberFormat's behavior consists of one or more rule * sets. Each rule set consists of a name, a colon, and a list of rules. A rule * set name must begin with a % sign. Rule sets with names that begin with a single % sign * are public: the caller can specify that they be used to format and parse numbers. * Rule sets with names that begin with %% are private: they exist only for the use * of other rule sets. If a formatter only has one rule set, the name may be omitted.

* *

The user can also specify a special "rule set" named %%lenient-parse. * The body of %%lenient-parse isn't a set of number-formatting rules, but a RuleBasedCollator * description which is used to define equivalences for lenient parsing. For more information * on the syntax, see RuleBasedCollator. For more information on lenient parsing, * see setLenientParse(). Note: symbols that have syntactic meaning * in collation rules, such as '&', have no particular meaning when appearing outside * of the lenient-parse rule set.

* *

The body of a rule set consists of an ordered, semicolon-delimited list of rules. * Internally, every rule has a base value, a divisor, rule text, and zero, one, or two substitutions. * These parameters are controlled by the description syntax, which consists of a rule * descriptor, a colon, and a rule body.

* *

A rule descriptor can take one of the following forms (text in italics is the * name of a token):

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
bv:bv specifies the rule's base value. bv is a decimal * number expressed using ASCII digits. bv may contain spaces, period, and commas, * which are ignored. The rule's divisor is the highest power of 10 less than or equal to * the base value.
bv/rad:bv specifies the rule's base value. The rule's divisor is the * highest power of rad less than or equal to the base value.
bv>:bv specifies the rule's base value. To calculate the divisor, * let the radix be 10, and the exponent be the highest exponent of the radix that yields a * result less than or equal to the base value. Every > character after the base value * decreases the exponent by 1. If the exponent is positive or 0, the divisor is the radix * raised to the power of the exponent; otherwise, the divisor is 1.
bv/rad>:bv specifies the rule's base value. To calculate the divisor, * let the radix be rad, and the exponent be the highest exponent of the radix that * yields a result less than or equal to the base value. Every > character after the radix * decreases the exponent by 1. If the exponent is positive or 0, the divisor is the radix * raised to the power of the exponent; otherwise, the divisor is 1.
-x:The rule is a negative-number rule.
x.x:The rule is an improper fraction rule. If the full stop in * the middle of the rule name is replaced with the decimal point * that is used in the language or DecimalFormatSymbols, then that rule will * have precedence when formatting and parsing this rule. For example, some * languages use the comma, and can thus be written as x,x instead. For example, * you can use "x.x: << point >>;x,x: << comma >>;" to * handle the decimal point that matches the language's natural spelling of * the punctuation of either the full stop or comma.
0.x:The rule is a proper fraction rule. If the full stop in * the middle of the rule name is replaced with the decimal point * that is used in the language or DecimalFormatSymbols, then that rule will * have precedence when formatting and parsing this rule. For example, some * languages use the comma, and can thus be written as 0,x instead. For example, * you can use "0.x: point >>;0,x: comma >>;" to * handle the decimal point that matches the language's natural spelling of * the punctuation of either the full stop or comma
x.0:The rule is a default rule. If the full stop in * the middle of the rule name is replaced with the decimal point * that is used in the language or DecimalFormatSymbols, then that rule will * have precedence when formatting and parsing this rule. For example, some * languages use the comma, and can thus be written as x,0 instead. For example, * you can use "x.0: << point;x,0: << comma;" to * handle the decimal point that matches the language's natural spelling of * the punctuation of either the full stop or comma
Inf:The rule for infinity.
NaN:The rule for an IEEE 754 NaN (not a number).
nothingIf the rule's rule descriptor is left out, the base value is one plus the * preceding rule's base value (or zero if this is the first rule in the list) in a normal * rule set.  In a fraction rule set, the base value is the same as the preceding rule's * base value.
* *

A rule set may be either a regular rule set or a fraction rule set, depending * on whether it is used to format a number's integral part (or the whole number) or a * number's fractional part. Using a rule set to format a rule's fractional part makes it a * fraction rule set.

* *

Which rule is used to format a number is defined according to one of the following * algorithms: If the rule set is a regular rule set, do the following: * *

    *
  • If the rule set includes a default rule (and the number was passed in as a double), * use the default rule.  (If the number being formatted was passed in as a long, * the default rule is ignored.)
  • *
  • If the number is negative, use the negative-number rule.
  • *
  • If the number has a fractional part and is greater than 1, use the improper fraction * rule.
  • *
  • If the number has a fractional part and is between 0 and 1, use the proper fraction * rule.
  • *
  • Binary-search the rule list for the rule with the highest base value less than or equal * to the number. If that rule has two substitutions, its base value is not an even multiple * of its divisor, and the number is an even multiple of the rule's divisor, use the * rule that precedes it in the rule list. Otherwise, use the rule itself.
  • *
* *

If the rule set is a fraction rule set, do the following: * *

    *
  • Ignore negative-number and fraction rules.
  • *
  • For each rule in the list, multiply the number being formatted (which will always be * between 0 and 1) by the rule's base value. Keep track of the distance between the result * the nearest integer.
  • *
  • Use the rule that produced the result closest to zero in the above calculation. In the * event of a tie or a direct hit, use the first matching rule encountered. (The idea here is * to try each rule's base value as a possible denominator of a fraction. Whichever * denominator produces the fraction closest in value to the number being formatted wins.) If * the rule following the matching rule has the same base value, use it if the numerator of * the fraction is anything other than 1; if the numerator is 1, use the original matching * rule. (This is to allow singular and plural forms of the rule text without a lot of extra * hassle.)
  • *
* *

A rule's body consists of a string of characters terminated by a semicolon. The rule * may include zero, one, or two substitution tokens, and a range of text in * brackets. The brackets denote optional text (and may also include one or both * substitutions). The exact meanings of the substitution tokens, and under what conditions * optional text is omitted, depend on the syntax of the substitution token and the context. * The rest of the text in a rule body is literal text that is output when the rule matches * the number being formatted.

* *

A substitution token begins and ends with a token character. The token * character and the context together specify a mathematical operation to be performed on the * number being formatted. An optional substitution descriptor specifies how the * value resulting from that operation is used to fill in the substitution. The position of * the substitution token in the rule body specifies the location of the resultant text in * the original rule text.

* *

The meanings of the substitution token characters are as follows:

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>in normal ruleDivide the number by the rule's divisor and format the remainder
in negative-number ruleFind the absolute value of the number and format the result
in fraction or default ruleIsolate the number's fractional part and format it.
in rule in fraction rule setNot allowed.
>>>in normal ruleDivide the number by the rule's divisor and format the remainder, * but bypass the normal rule-selection process and just use the * rule that precedes this one in this rule list.
in all other rulesNot allowed.
<<in normal ruleDivide the number by the rule's divisor and format the quotient
in negative-number ruleNot allowed.
in fraction or default ruleIsolate the number's integral part and format it.
in rule in fraction rule setMultiply the number by the rule's base value and format the result.
==in all rule setsFormat the number unchanged
[]in normal ruleOmit the optional text if the number is an even multiple of the rule's divisor
in negative-number ruleNot allowed.
in improper-fraction ruleOmit the optional text if the number is between 0 and 1 (same as specifying both an * x.x rule and a 0.x rule)
in default ruleOmit the optional text if the number is an integer (same as specifying both an x.x * rule and an x.0 rule)
in proper-fraction ruleNot allowed.
in rule in fraction rule setOmit the optional text if multiplying the number by the rule's base value yields 1.
$(cardinal,plural syntax)$in all rule setsThis provides the ability to choose a word based on the number divided by the radix to the power of the * exponent of the base value for the specified locale, which is normally equivalent to the << value. * This uses the cardinal plural rules from PluralFormat. All strings used in the plural format are treated * as the same base value for parsing.
$(ordinal,plural syntax)$in all rule setsThis provides the ability to choose a word based on the number divided by the radix to the power of the * exponent of the base value for the specified locale, which is normally equivalent to the << value. * This uses the ordinal plural rules from PluralFormat. All strings used in the plural format are treated * as the same base value for parsing.
* *

The substitution descriptor (i.e., the text between the token characters) may take one * of three forms:

* * * * * * * * * * * * * * * * * *
a rule set namePerform the mathematical operation on the number, and format the result using the * named rule set.
a DecimalFormat patternPerform the mathematical operation on the number, and format the result using a * DecimalFormat with the specified pattern.  The pattern must begin with 0 or #.
nothingPerform the mathematical operation on the number, and format the result using the rule * set containing the current rule, except:
    *
  • You can't have an empty substitution descriptor with a == substitution.
  • *
  • If you omit the substitution descriptor in a >> substitution in a fraction rule, * format the result one digit at a time using the rule set containing the current rule.
  • *
  • If you omit the substitution descriptor in a << substitution in a rule in a * fraction rule set, format the result using the default rule set for this formatter.
  • *
*
* *

Whitespace is ignored between a rule set name and a rule set body, between a rule * descriptor and a rule body, or between rules. If a rule body begins with an apostrophe, * the apostrophe is ignored, but all text after it becomes significant (this is how you can * have a rule's rule text begin with whitespace). There is no escape function: the semicolon * is not allowed in rule set names or in rule text, and the colon is not allowed in rule set * names. The characters beginning a substitution token are always treated as the beginning * of a substitution token.

* *

See the resource data and the demo program for annotated examples of real rule sets * using these features.

* * @author Richard Gillam * @see NumberFormat * @see DecimalFormat * @see PluralFormat * @see PluralRules * @stable ICU 2.0 */ public class RuleBasedNumberFormat extends NumberFormat { //----------------------------------------------------------------------- // constants //----------------------------------------------------------------------- // Generated by serialver from JDK 1.4.1_01 static final long serialVersionUID = -7664252765575395068L; /** * Selector code that tells the constructor to create a spellout formatter * @stable ICU 2.0 */ public static final int SPELLOUT = 1; /** * Selector code that tells the constructor to create an ordinal formatter * @stable ICU 2.0 */ public static final int ORDINAL = 2; /** * Selector code that tells the constructor to create a duration formatter * @stable ICU 2.0 */ public static final int DURATION = 3; /** * Selector code that tells the constructor to create a numbering system formatter * @stable ICU 4.2 */ public static final int NUMBERING_SYSTEM = 4; //----------------------------------------------------------------------- // data members //----------------------------------------------------------------------- /** * The formatter's rule sets. */ private transient NFRuleSet[] ruleSets = null; /** * The formatter's rule names mapped to rule sets. */ private transient Map ruleSetsMap = null; /** * A pointer to the formatter's default rule set. This is always included * in ruleSets. */ private transient NFRuleSet defaultRuleSet = null; /** * The formatter's locale. This is used to create DecimalFormatSymbols and * Collator objects. * @serial */ private ULocale locale = null; /** * The formatter's rounding mode. * @serial */ private int roundingMode = BigDecimal.ROUND_UNNECESSARY; /** * Collator to be used in lenient parsing. This variable is lazy-evaluated: * the collator is actually created the first time the client does a parse * with lenient-parse mode turned on. */ private transient RbnfLenientScannerProvider scannerProvider = null; // flag to mark whether we've previously looked for a scanner and failed private transient boolean lookedForScanner; /** * The DecimalFormatSymbols object that any DecimalFormat objects this * formatter uses should use. This variable is lazy-evaluated: it isn't * filled in if the rule set never uses a DecimalFormat pattern. */ private transient DecimalFormatSymbols decimalFormatSymbols = null; /** * The NumberFormat used when lenient parsing numbers. This needs to reflect * the locale. This is lazy-evaluated, like decimalFormatSymbols. It is * here so it can be shared by different NFSubstitutions. */ private transient DecimalFormat decimalFormat = null; /** * The rule used when dealing with infinity. This is lazy-evaluated, and derived from decimalFormat. * It is here so it can be shared by different NFRuleSets. */ private transient NFRule defaultInfinityRule = null; /** * The rule used when dealing with IEEE 754 NaN. This is lazy-evaluated, and derived from decimalFormat. * It is here so it can be shared by different NFRuleSets. */ private transient NFRule defaultNaNRule = null; /** * Flag specifying whether lenient parse mode is on or off. Off by default. * @serial */ private boolean lenientParse = false; /** * If the description specifies lenient-parse rules, they're stored here until * the collator is created. */ private transient String lenientParseRules; /** * If the description specifies post-process rules, they're stored here until * post-processing is required. */ private transient String postProcessRules; /** * Post processor lazily constructed from the postProcessRules. */ private transient RBNFPostProcessor postProcessor; /** * Localizations for rule set names. * @serial */ private Map ruleSetDisplayNames; /** * The public rule set names; * @serial */ private String[] publicRuleSetNames; /** * Data for handling context-based capitalization */ private boolean capitalizationInfoIsSet = false; private boolean capitalizationForListOrMenu = false; private boolean capitalizationForStandAlone = false; private transient BreakIterator capitalizationBrkIter = null; private static final boolean DEBUG = ICUDebug.enabled("rbnf"); //----------------------------------------------------------------------- // constructors //----------------------------------------------------------------------- /** * Creates a RuleBasedNumberFormat that behaves according to the description * passed in. The formatter uses the default FORMAT locale. * @param description A description of the formatter's desired behavior. * See the class documentation for a complete explanation of the description * syntax. * @see Category#FORMAT * @stable ICU 2.0 */ public RuleBasedNumberFormat(String description) { locale = ULocale.getDefault(Category.FORMAT); init(description, null); } /** * Creates a RuleBasedNumberFormat that behaves according to the description * passed in. The formatter uses the default FORMAT locale. *

* The localizations data provides information about the public * rule sets and their localized display names for different * locales. The first element in the list is an array of the names * of the public rule sets. The first element in this array is * the initial default ruleset. The remaining elements in the * list are arrays of localizations of the names of the public * rule sets. Each of these is one longer than the initial array, * with the first String being the ULocale ID, and the remaining * Strings being the localizations of the rule set names, in the * same order as the initial array. * @param description A description of the formatter's desired behavior. * See the class documentation for a complete explanation of the description * syntax. * @param localizations a list of localizations for the rule set * names in the description. * @see Category#FORMAT * @stable ICU 3.2 */ public RuleBasedNumberFormat(String description, String[][] localizations) { locale = ULocale.getDefault(Category.FORMAT); init(description, localizations); } /** * Creates a RuleBasedNumberFormat that behaves according to the description * passed in. The formatter uses the specified locale to determine the * characters to use when formatting in numerals, and to define equivalences * for lenient parsing. * @param description A description of the formatter's desired behavior. * See the class documentation for a complete explanation of the description * syntax. * @param locale A locale, which governs which characters are used for * formatting values in numerals, and which characters are equivalent in * lenient parsing. * @stable ICU 2.0 */ public RuleBasedNumberFormat(String description, Locale locale) { this(description, ULocale.forLocale(locale)); } /** * Creates a RuleBasedNumberFormat that behaves according to the description * passed in. The formatter uses the specified locale to determine the * characters to use when formatting in numerals, and to define equivalences * for lenient parsing. * @param description A description of the formatter's desired behavior. * See the class documentation for a complete explanation of the description * syntax. * @param locale A locale, which governs which characters are used for * formatting values in numerals, and which characters are equivalent in * lenient parsing. * @stable ICU 3.2 */ public RuleBasedNumberFormat(String description, ULocale locale) { this.locale = locale; init(description, null); } /** * Creates a RuleBasedNumberFormat that behaves according to the description * passed in. The formatter uses the specified locale to determine the * characters to use when formatting in numerals, and to define equivalences * for lenient parsing. *

* The localizations data provides information about the public * rule sets and their localized display names for different * locales. The first element in the list is an array of the names * of the public rule sets. The first element in this array is * the initial default ruleset. The remaining elements in the * list are arrays of localizations of the names of the public * rule sets. Each of these is one longer than the initial array, * with the first String being the ULocale ID, and the remaining * Strings being the localizations of the rule set names, in the * same order as the initial array. * @param description A description of the formatter's desired behavior. * See the class documentation for a complete explanation of the description * syntax. * @param localizations a list of localizations for the rule set names in the description. * @param locale A ULocale that governs which characters are used for * formatting values in numerals, and determines which characters are equivalent in * lenient parsing. * @stable ICU 3.2 */ public RuleBasedNumberFormat(String description, String[][] localizations, ULocale locale) { this.locale = locale; init(description, localizations); } /** * Creates a RuleBasedNumberFormat from a predefined description. The selector * code chooses among three possible predefined formats: spellout, ordinal, * and duration. * @param locale The locale for the formatter. * @param format A selector code specifying which kind of formatter to create for that * locale. There are three legal values: SPELLOUT, which creates a formatter that * spells out a value in words in the desired language, ORDINAL, which attaches * an ordinal suffix from the desired language to the end of a number (e.g. "123rd"), * and DURATION, which formats a duration in seconds as hours, minutes, and seconds. * @stable ICU 2.0 */ public RuleBasedNumberFormat(Locale locale, int format) { this(ULocale.forLocale(locale), format); } /** * Creates a RuleBasedNumberFormat from a predefined description. The selector * code chooses among three possible predefined formats: spellout, ordinal, * and duration. * @param locale The locale for the formatter. * @param format A selector code specifying which kind of formatter to create for that * locale. There are four legal values: SPELLOUT, which creates a formatter that * spells out a value in words in the desired language, ORDINAL, which attaches * an ordinal suffix from the desired language to the end of a number (e.g. "123rd"), * DURATION, which formats a duration in seconds as hours, minutes, and seconds, and * NUMBERING_SYSTEM, which is used to invoke rules for alternate numbering * systems such as the Hebrew numbering system, or for Roman numerals, etc.. * @stable ICU 3.2 */ public RuleBasedNumberFormat(ULocale locale, int format) { this.locale = locale; ICUResourceBundle bundle = (ICUResourceBundle)UResourceBundle. getBundleInstance(ICUData.ICU_RBNF_BASE_NAME, locale); // TODO: determine correct actual/valid locale. Note ambiguity // here -- do actual/valid refer to pattern, DecimalFormatSymbols, // or Collator? ULocale uloc = bundle.getULocale(); setLocale(uloc, uloc); StringBuilder description = new StringBuilder(); String[][] localizations = null; try { ICUResourceBundle rules = bundle.getWithFallback("RBNFRules/"+rulenames[format-1]); UResourceBundleIterator it = rules.getIterator(); while (it.hasNext()) { description.append(it.nextString()); } } catch (MissingResourceException e1) { } // We use findTopLevel() instead of get() because // it's faster when we know that it's usually going to fail. UResourceBundle locNamesBundle = bundle.findTopLevel(locnames[format - 1]); if (locNamesBundle != null) { localizations = new String[locNamesBundle.getSize()][]; for (int i = 0; i < localizations.length; ++i) { localizations[i] = locNamesBundle.get(i).getStringArray(); } } // else there are no localized names. It's not that important. init(description.toString(), localizations); } private static final String[] rulenames = { "SpelloutRules", "OrdinalRules", "DurationRules", "NumberingSystemRules", }; private static final String[] locnames = { "SpelloutLocalizations", "OrdinalLocalizations", "DurationLocalizations", "NumberingSystemLocalizations", }; /** * Creates a RuleBasedNumberFormat from a predefined description. Uses the * default FORMAT locale. * @param format A selector code specifying which kind of formatter to create. * There are three legal values: SPELLOUT, which creates a formatter that spells * out a value in words in the default locale's language, ORDINAL, which attaches * an ordinal suffix from the default locale's language to a numeral, and * DURATION, which formats a duration in seconds as hours, minutes, and seconds always rounding down. * or NUMBERING_SYSTEM, which is used for alternate numbering systems such as Hebrew. * @see Category#FORMAT * @stable ICU 2.0 */ public RuleBasedNumberFormat(int format) { this(ULocale.getDefault(Category.FORMAT), format); } //----------------------------------------------------------------------- // boilerplate //----------------------------------------------------------------------- /** * Duplicates this formatter. * @return A RuleBasedNumberFormat that is equal to this one. * @stable ICU 2.0 */ @Override public Object clone() { return super.clone(); } /** * Tests two RuleBasedNumberFormats for equality. * @param that The formatter to compare against this one. * @return true if the two formatters have identical behavior. * @stable ICU 2.0 */ @Override public boolean equals(Object that) { // if the other object isn't a RuleBasedNumberFormat, that's // all we need to know // Test for capitalization info equality is adequately handled // by the NumberFormat test for capitalizationSetting equality; // the info here is just derived from that. if (!(that instanceof RuleBasedNumberFormat)) { return false; } else { // cast the other object's pointer to a pointer to a // RuleBasedNumberFormat RuleBasedNumberFormat that2 = (RuleBasedNumberFormat)that; // compare their locales and lenient-parse modes if (!locale.equals(that2.locale) || lenientParse != that2.lenientParse) { return false; } // if that succeeds, then compare their rule set lists if (ruleSets.length != that2.ruleSets.length) { return false; } for (int i = 0; i < ruleSets.length; i++) { if (!ruleSets[i].equals(that2.ruleSets[i])) { return false; } } return true; } } /** * {@inheritDoc} * @stable ICU 2.0 */ @Override public int hashCode() { return super.hashCode(); } /** * Generates a textual description of this formatter. * @return a String containing a rule set that will produce a RuleBasedNumberFormat * with identical behavior to this one. This won't necessarily be identical * to the rule set description that was originally passed in, but will produce * the same result. * @stable ICU 2.0 */ @Override public String toString() { // accumulate the descriptions of all the rule sets in a // StringBuffer, then cast it to a String and return it StringBuilder result = new StringBuilder(); for (NFRuleSet ruleSet : ruleSets) { result.append(ruleSet.toString()); } return result.toString(); } /** * Writes this object to a stream. * @param out The stream to write to. */ private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException { // we just write the textual description to the stream, so we // have an implementation-independent streaming format out.writeUTF(this.toString()); out.writeObject(this.locale); out.writeInt(this.roundingMode); } /** * Reads this object in from a stream. * @param in The stream to read from. */ private void readObject(java.io.ObjectInputStream in) throws java.io.IOException { // read the description in from the stream String description = in.readUTF(); ULocale loc; try { loc = (ULocale) in.readObject(); } catch (Exception e) { loc = ULocale.getDefault(Category.FORMAT); } try { roundingMode = in.readInt(); } catch (Exception ignored) { } // build a brand-new RuleBasedNumberFormat from the description, // then steal its substructure. This object's substructure and // the temporary RuleBasedNumberFormat drop on the floor and // get swept up by the garbage collector RuleBasedNumberFormat temp = new RuleBasedNumberFormat(description, loc); ruleSets = temp.ruleSets; ruleSetsMap = temp.ruleSetsMap; defaultRuleSet = temp.defaultRuleSet; publicRuleSetNames = temp.publicRuleSetNames; decimalFormatSymbols = temp.decimalFormatSymbols; decimalFormat = temp.decimalFormat; locale = temp.locale; defaultInfinityRule = temp.defaultInfinityRule; defaultNaNRule = temp.defaultNaNRule; } //----------------------------------------------------------------------- // public API functions //----------------------------------------------------------------------- /** * Returns a list of the names of all of this formatter's public rule sets. * @return A list of the names of all of this formatter's public rule sets. * @stable ICU 2.0 */ public String[] getRuleSetNames() { return publicRuleSetNames.clone(); } /** * Return a list of locales for which there are locale-specific display names * for the rule sets in this formatter. If there are no localized display names, return null. * @return an array of the ULocales for which there is rule set display name information * @stable ICU 3.2 */ public ULocale[] getRuleSetDisplayNameLocales() { if (ruleSetDisplayNames != null) { Set s = ruleSetDisplayNames.keySet(); String[] locales = s.toArray(new String[s.size()]); Arrays.sort(locales, String.CASE_INSENSITIVE_ORDER); ULocale[] result = new ULocale[locales.length]; for (int i = 0; i < locales.length; ++i) { result[i] = new ULocale(locales[i]); } return result; } return null; } private String[] getNameListForLocale(ULocale loc) { if (loc != null && ruleSetDisplayNames != null) { String[] localeNames = { loc.getBaseName(), ULocale.getDefault(Category.DISPLAY).getBaseName() }; for (String lname : localeNames) { while (lname.length() > 0) { String[] names = ruleSetDisplayNames.get(lname); if (names != null) { return names; } lname = ULocale.getFallback(lname); } } } return null; } /** * Return the rule set display names for the provided locale. These are in the same order * as those returned by getRuleSetNames. The locale is matched against the locales for * which there is display name data, using normal fallback rules. If no locale matches, * the default display names are returned. (These are the internal rule set names minus * the leading '%'.) * @return an array of the locales that have display name information * @see #getRuleSetNames * @stable ICU 3.2 */ public String[] getRuleSetDisplayNames(ULocale loc) { String[] names = getNameListForLocale(loc); if (names != null) { return names.clone(); } names = getRuleSetNames(); for (int i = 0; i < names.length; ++i) { names[i] = names[i].substring(1); } return names; } /** * Return the rule set display names for the current default DISPLAY locale. * @return an array of the display names * @see #getRuleSetDisplayNames(ULocale) * @see Category#DISPLAY * @stable ICU 3.2 */ public String[] getRuleSetDisplayNames() { return getRuleSetDisplayNames(ULocale.getDefault(Category.DISPLAY)); } /** * Return the rule set display name for the provided rule set and locale. * The locale is matched against the locales for which there is display name data, using * normal fallback rules. If no locale matches, the default display name is returned. * @return the display name for the rule set * @see #getRuleSetDisplayNames * @throws IllegalArgumentException if ruleSetName is not a valid rule set name for this format * @stable ICU 3.2 */ public String getRuleSetDisplayName(String ruleSetName, ULocale loc) { String[] rsnames = publicRuleSetNames; for (int ix = 0; ix < rsnames.length; ++ix) { if (rsnames[ix].equals(ruleSetName)) { String[] names = getNameListForLocale(loc); if (names != null) { return names[ix]; } return rsnames[ix].substring(1); } } throw new IllegalArgumentException("unrecognized rule set name: " + ruleSetName); } /** * Return the rule set display name for the provided rule set in the current default DISPLAY locale. * @return the display name for the rule set * @see #getRuleSetDisplayName(String,ULocale) * @see Category#DISPLAY * @stable ICU 3.2 */ public String getRuleSetDisplayName(String ruleSetName) { return getRuleSetDisplayName(ruleSetName, ULocale.getDefault(Category.DISPLAY)); } /** * Formats the specified number according to the specified rule set. * @param number The number to format. * @param ruleSet The name of the rule set to format the number with. * This must be the name of a valid public rule set for this formatter. * @return A textual representation of the number. * @stable ICU 2.0 */ public String format(double number, String ruleSet) throws IllegalArgumentException { if (ruleSet.startsWith("%%")) { throw new IllegalArgumentException("Can't use internal rule set"); } return adjustForContext(format(number, findRuleSet(ruleSet))); } /** * Formats the specified number according to the specified rule set. * (If the specified rule set specifies a default ["x.0"] rule, this function * ignores it. Convert the number to a double first if you ned it.) This * function preserves all the precision in the long-- it doesn't convert it * to a double. * @param number The number to format. * @param ruleSet The name of the rule set to format the number with. * This must be the name of a valid public rule set for this formatter. * @return A textual representation of the number. * @stable ICU 2.0 */ public String format(long number, String ruleSet) throws IllegalArgumentException { if (ruleSet.startsWith("%%")) { throw new IllegalArgumentException("Can't use internal rule set"); } return adjustForContext(format(number, findRuleSet(ruleSet))); } /** * Formats the specified number using the formatter's default rule set. * (The default rule set is the last public rule set defined in the description.) * @param number The number to format. * @param toAppendTo A StringBuffer that the result should be appended to. * @param ignore This function doesn't examine or update the field position. * @return toAppendTo * @stable ICU 2.0 */ @Override public StringBuffer format(double number, StringBuffer toAppendTo, FieldPosition ignore) { // this is one of the inherited format() methods. Since it doesn't // have a way to select the rule set to use, it just uses the // default one // Note, the BigInteger/BigDecimal methods below currently go through this. if (toAppendTo.length() == 0) { toAppendTo.append(adjustForContext(format(number, defaultRuleSet))); } else { // appending to other text, don't capitalize toAppendTo.append(format(number, defaultRuleSet)); } return toAppendTo; } /** * Formats the specified number using the formatter's default rule set. * (The default rule set is the last public rule set defined in the description.) * (If the specified rule set specifies a default ["x.0"] rule, this function * ignores it. Convert the number to a double first if you ned it.) This * function preserves all the precision in the long-- it doesn't convert it * to a double. * @param number The number to format. * @param toAppendTo A StringBuffer that the result should be appended to. * @param ignore This function doesn't examine or update the field position. * @return toAppendTo * @stable ICU 2.0 */ @Override public StringBuffer format(long number, StringBuffer toAppendTo, FieldPosition ignore) { // this is one of the inherited format() methods. Since it doesn't // have a way to select the rule set to use, it just uses the // default one if (toAppendTo.length() == 0) { toAppendTo.append(adjustForContext(format(number, defaultRuleSet))); } else { // appending to other text, don't capitalize toAppendTo.append(format(number, defaultRuleSet)); } return toAppendTo; } /** * NEW * Implement com.ibm.icu.text.NumberFormat: * Format a BigInteger. * @stable ICU 2.0 */ @Override public StringBuffer format(BigInteger number, StringBuffer toAppendTo, FieldPosition pos) { return format(new com.ibm.icu.math.BigDecimal(number), toAppendTo, pos); } /** * NEW * Implement com.ibm.icu.text.NumberFormat: * Format a BigDecimal. * @stable ICU 2.0 */ @Override public StringBuffer format(java.math.BigDecimal number, StringBuffer toAppendTo, FieldPosition pos) { return format(new com.ibm.icu.math.BigDecimal(number), toAppendTo, pos); } private static final com.ibm.icu.math.BigDecimal MAX_VALUE = com.ibm.icu.math.BigDecimal.valueOf(Long.MAX_VALUE); private static final com.ibm.icu.math.BigDecimal MIN_VALUE = com.ibm.icu.math.BigDecimal.valueOf(Long.MIN_VALUE); /** * NEW * Implement com.ibm.icu.text.NumberFormat: * Format a BigDecimal. * @stable ICU 2.0 */ @Override public StringBuffer format(com.ibm.icu.math.BigDecimal number, StringBuffer toAppendTo, FieldPosition pos) { if (MIN_VALUE.compareTo(number) > 0 || MAX_VALUE.compareTo(number) < 0) { // We're outside of our normal range that this framework can handle. // The DecimalFormat will provide more accurate results. return getDecimalFormat().format(number, toAppendTo, pos); } if (number.scale() == 0) { return format(number.longValue(), toAppendTo, pos); } return format(number.doubleValue(), toAppendTo, pos); } /** * Parses the specified string, beginning at the specified position, according * to this formatter's rules. This will match the string against all of the * formatter's public rule sets and return the value corresponding to the longest * parseable substring. This function's behavior is affected by the lenient * parse mode. * @param text The string to parse * @param parsePosition On entry, contains the position of the first character * in "text" to examine. On exit, has been updated to contain the position * of the first character in "text" that wasn't consumed by the parse. * @return The number that corresponds to the parsed text. This will be an * instance of either Long or Double, depending on whether the result has a * fractional part. * @see #setLenientParseMode * @stable ICU 2.0 */ @Override public Number parse(String text, ParsePosition parsePosition) { // parsePosition tells us where to start parsing. We copy the // text in the string from here to the end inro a new string, // and create a new ParsePosition and result variable to use // for the duration of the parse operation String workingText = text.substring(parsePosition.getIndex()); ParsePosition workingPos = new ParsePosition(0); Number tempResult = null; // keep track of the largest number of characters consumed in // the various trials, and the result that corresponds to it Number result = NFRule.ZERO; ParsePosition highWaterMark = new ParsePosition(workingPos.getIndex()); // iterate over the public rule sets (beginning with the default one) // and try parsing the text with each of them. Keep track of which // one consumes the most characters: that's the one that determines // the result we return for (int i = ruleSets.length - 1; i >= 0; i--) { // skip private or unparseable rule sets if (!ruleSets[i].isPublic() || !ruleSets[i].isParseable()) { continue; } // try parsing the string with the rule set. If it gets past the // high-water mark, update the high-water mark and the result tempResult = ruleSets[i].parse(workingText, workingPos, Double.MAX_VALUE, 0); if (workingPos.getIndex() > highWaterMark.getIndex()) { result = tempResult; highWaterMark.setIndex(workingPos.getIndex()); } // commented out because this API on ParsePosition doesn't exist in 1.1.x // if (workingPos.getErrorIndex() > highWaterMark.getErrorIndex()) { // highWaterMark.setErrorIndex(workingPos.getErrorIndex()); // } // if we manage to use up all the characters in the string, // we don't have to try any more rule sets if (highWaterMark.getIndex() == workingText.length()) { break; } // otherwise, reset our internal parse position to the // beginning and try again with the next rule set workingPos.setIndex(0); } // add the high water mark to our original parse position and // return the result parsePosition.setIndex(parsePosition.getIndex() + highWaterMark.getIndex()); // commented out because this API on ParsePosition doesn't exist in 1.1.x // if (highWaterMark.getIndex() == 0) { // parsePosition.setErrorIndex(parsePosition.getIndex() + highWaterMark.getErrorIndex()); // } return result; } /** * Turns lenient parse mode on and off. * * When in lenient parse mode, the formatter uses an RbnfLenientScanner * for parsing the text. Lenient parsing is only in effect if a scanner * is set. If a provider is not set, and this is used for parsing, * a default scanner RbnfLenientScannerProviderImpl will be set if * it is available on the classpath. Otherwise this will have no effect. * * @param enabled If true, turns lenient-parse mode on; if false, turns it off. * @see RbnfLenientScanner * @see RbnfLenientScannerProvider * @stable ICU 2.0 */ public void setLenientParseMode(boolean enabled) { lenientParse = enabled; } /** * Returns true if lenient-parse mode is turned on. Lenient parsing is off * by default. * @return true if lenient-parse mode is turned on. * @see #setLenientParseMode * @stable ICU 2.0 */ public boolean lenientParseEnabled() { return lenientParse; } /** * Sets the provider for the lenient scanner. If this has not been set, * {@link #setLenientParseMode} * has no effect. This is necessary to decouple collation from format code. * @param scannerProvider the provider * @see #setLenientParseMode * @see #getLenientScannerProvider * @stable ICU 4.4 */ public void setLenientScannerProvider(RbnfLenientScannerProvider scannerProvider) { this.scannerProvider = scannerProvider; } /** * Returns the lenient scanner provider. If none was set, and lenient parse is * enabled, this will attempt to instantiate a default scanner, setting it if * it was successful. Otherwise this returns false. * * @see #setLenientScannerProvider * @stable ICU 4.4 */ public RbnfLenientScannerProvider getLenientScannerProvider() { // there's a potential race condition if two threads try to set/get the scanner at // the same time, but you get what you get, and you shouldn't be using this from // multiple threads anyway. if (scannerProvider == null && lenientParse && !lookedForScanner) { try { lookedForScanner = true; Class cls = Class.forName("com.ibm.icu.impl.text.RbnfScannerProviderImpl"); RbnfLenientScannerProvider provider = (RbnfLenientScannerProvider)cls.newInstance(); setLenientScannerProvider(provider); } catch (Exception e) { // any failure, we just ignore and return null } } return scannerProvider; } /** * Override the default rule set to use. If ruleSetName is null, reset * to the initial default rule set. * @param ruleSetName the name of the rule set, or null to reset the initial default. * @throws IllegalArgumentException if ruleSetName is not the name of a public ruleset. * @stable ICU 2.0 */ public void setDefaultRuleSet(String ruleSetName) { if (ruleSetName == null) { if (publicRuleSetNames.length > 0) { defaultRuleSet = findRuleSet(publicRuleSetNames[0]); } else { defaultRuleSet = null; int n = ruleSets.length; while (--n >= 0) { String currentName = ruleSets[n].getName(); if (currentName.equals("%spellout-numbering") || currentName.equals("%digits-ordinal") || currentName.equals("%duration")) { defaultRuleSet = ruleSets[n]; return; } } n = ruleSets.length; while (--n >= 0) { if (ruleSets[n].isPublic()) { defaultRuleSet = ruleSets[n]; break; } } } } else if (ruleSetName.startsWith("%%")) { throw new IllegalArgumentException("cannot use private rule set: " + ruleSetName); } else { defaultRuleSet = findRuleSet(ruleSetName); } } /** * Return the name of the current default rule set. * @return the name of the current default rule set, if it is public, else the empty string. * @stable ICU 3.0 */ public String getDefaultRuleSetName() { if (defaultRuleSet != null && defaultRuleSet.isPublic()) { return defaultRuleSet.getName(); } return ""; } /** * Sets the decimal format symbols used by this formatter. The formatter uses a copy of the * provided symbols. * * @param newSymbols desired DecimalFormatSymbols * @see DecimalFormatSymbols * @stable ICU 49 */ public void setDecimalFormatSymbols(DecimalFormatSymbols newSymbols) { if (newSymbols != null) { decimalFormatSymbols = (DecimalFormatSymbols) newSymbols.clone(); if (decimalFormat != null) { decimalFormat.setDecimalFormatSymbols(decimalFormatSymbols); } if (defaultInfinityRule != null) { defaultInfinityRule = null; getDefaultInfinityRule(); // Reset with the new DecimalFormatSymbols } if (defaultNaNRule != null) { defaultNaNRule = null; getDefaultNaNRule(); // Reset with the new DecimalFormatSymbols } // Apply the new decimalFormatSymbols by reparsing the rulesets for (NFRuleSet ruleSet : ruleSets) { ruleSet.setDecimalFormatSymbols(decimalFormatSymbols); } } } /** * {@icu} Set a particular DisplayContext value in the formatter, * such as CAPITALIZATION_FOR_STANDALONE. Note: For getContext, see * NumberFormat. * * @param context The DisplayContext value to set. * @stable ICU 53 */ // Here we override the NumberFormat implementation in order to // lazily initialize relevant items @Override public void setContext(DisplayContext context) { super.setContext(context); if (!capitalizationInfoIsSet && (context==DisplayContext.CAPITALIZATION_FOR_UI_LIST_OR_MENU || context==DisplayContext.CAPITALIZATION_FOR_STANDALONE)) { initCapitalizationContextInfo(locale); capitalizationInfoIsSet = true; } if (capitalizationBrkIter == null && (context==DisplayContext.CAPITALIZATION_FOR_BEGINNING_OF_SENTENCE || (context==DisplayContext.CAPITALIZATION_FOR_UI_LIST_OR_MENU && capitalizationForListOrMenu) || (context==DisplayContext.CAPITALIZATION_FOR_STANDALONE && capitalizationForStandAlone) )) { capitalizationBrkIter = BreakIterator.getSentenceInstance(locale); } } /** * Returns the rounding mode. * * @return A rounding mode, between BigDecimal.ROUND_UP and * BigDecimal.ROUND_UNNECESSARY. * @see #setRoundingMode * @see java.math.BigDecimal * @stable ICU 56 */ @Override public int getRoundingMode() { return roundingMode; } /** * Sets the rounding mode. This has no effect unless the rounding increment is greater * than zero. * * @param roundingMode A rounding mode, between BigDecimal.ROUND_UP and * BigDecimal.ROUND_UNNECESSARY. * @exception IllegalArgumentException if roundingMode is unrecognized. * @see #getRoundingMode * @see java.math.BigDecimal * @stable ICU 56 */ @Override public void setRoundingMode(int roundingMode) { if (roundingMode < BigDecimal.ROUND_UP || roundingMode > BigDecimal.ROUND_UNNECESSARY) { throw new IllegalArgumentException("Invalid rounding mode: " + roundingMode); } this.roundingMode = roundingMode; } //----------------------------------------------------------------------- // package-internal API //----------------------------------------------------------------------- /** * Returns a reference to the formatter's default rule set. The default * rule set is the last public rule set in the description, or the one * most recently set by setDefaultRuleSet. * @return The formatter's default rule set. */ NFRuleSet getDefaultRuleSet() { return defaultRuleSet; } /** * Returns the scanner to use for lenient parsing. The scanner is * provided by the provider. * @return The collator to use for lenient parsing, or null if lenient parsing * is turned off. */ RbnfLenientScanner getLenientScanner() { if (lenientParse) { RbnfLenientScannerProvider provider = getLenientScannerProvider(); if (provider != null) { return provider.get(locale, lenientParseRules); } } return null; } /** * Returns the DecimalFormatSymbols object that should be used by all DecimalFormat * instances owned by this formatter. This object is lazily created: this function * creates it the first time it's called. * @return The DecimalFormatSymbols object that should be used by all DecimalFormat * instances owned by this formatter. */ DecimalFormatSymbols getDecimalFormatSymbols() { // lazy-evaluate the DecimalFormatSymbols object. This object // is shared by all DecimalFormat instances belonging to this // formatter if (decimalFormatSymbols == null) { decimalFormatSymbols = new DecimalFormatSymbols(locale); } return decimalFormatSymbols; } DecimalFormat getDecimalFormat() { if (decimalFormat == null) { // Don't use NumberFormat.getInstance, which can cause a recursive call String pattern = getPattern(locale, NUMBERSTYLE); decimalFormat = new DecimalFormat(pattern, getDecimalFormatSymbols()); } return decimalFormat; } PluralFormat createPluralFormat(PluralRules.PluralType pluralType, String pattern) { return new PluralFormat(locale, pluralType, pattern, getDecimalFormat()); } /** * Returns the default rule for infinity. This object is lazily created: this function * creates it the first time it's called. */ NFRule getDefaultInfinityRule() { if (defaultInfinityRule == null) { defaultInfinityRule = new NFRule(this, "Inf: " + getDecimalFormatSymbols().getInfinity()); } return defaultInfinityRule; } /** * Returns the default rule for NaN. This object is lazily created: this function * creates it the first time it's called. */ NFRule getDefaultNaNRule() { if (defaultNaNRule == null) { defaultNaNRule = new NFRule(this, "NaN: " + getDecimalFormatSymbols().getNaN()); } return defaultNaNRule; } //----------------------------------------------------------------------- // construction implementation //----------------------------------------------------------------------- /** * This extracts the special information from the rule sets before the * main parsing starts. Extra whitespace must have already been removed * from the description. If found, the special information is removed from the * description and returned, otherwise the description is unchanged and null * is returned. Note: the trailing semicolon at the end of the special * rules is stripped. * @param description the rbnf description with extra whitespace removed * @param specialName the name of the special rule text to extract * @return the special rule text, or null if the rule was not found */ private String extractSpecial(StringBuilder description, String specialName) { String result = null; int lp = description.indexOf(specialName); if (lp != -1) { // we've got to make sure we're not in the middle of a rule // (where specialName would actually get treated as // rule text) if (lp == 0 || description.charAt(lp - 1) == ';') { // locate the beginning and end of the actual special // rules (there may be whitespace between the name and // the first token in the description) int lpEnd = description.indexOf(";%", lp); if (lpEnd == -1) { lpEnd = description.length() - 1; // later we add 1 back to get the '%' } int lpStart = lp + specialName.length(); while (lpStart < lpEnd && PatternProps.isWhiteSpace(description.charAt(lpStart))) { ++lpStart; } // copy out the special rules result = description.substring(lpStart, lpEnd); // remove the special rule from the description description.delete(lp, lpEnd+1); // delete the semicolon but not the '%' } } return result; } /** * This function parses the description and uses it to build all of * internal data structures that the formatter uses to do formatting * @param description The description of the formatter's desired behavior. * This is either passed in by the caller or loaded out of a resource * by one of the constructors, and is in the description format specified * in the class docs. */ private void init(String description, String[][] localizations) { initLocalizations(localizations); // start by stripping the trailing whitespace from all the rules // (this is all the whitespace following each semicolon in the // description). This allows us to look for rule-set boundaries // by searching for ";%" without having to worry about whitespace // between the ; and the % StringBuilder descBuf = stripWhitespace(description); // check to see if there's a set of lenient-parse rules. If there // is, pull them out into our temporary holding place for them, // and delete them from the description before the real description- // parsing code sees them lenientParseRules = extractSpecial(descBuf, "%%lenient-parse:"); postProcessRules = extractSpecial(descBuf, "%%post-process:"); // pre-flight parsing the description and count the number of // rule sets (";%" marks the end of one rule set and the beginning // of the next) int numRuleSets = 1; int p = 0; while ((p = descBuf.indexOf(";%", p)) != -1) { ++numRuleSets; p += 2; // Skip the length of ";%" } // our rule list is an array of the appropriate size ruleSets = new NFRuleSet[numRuleSets]; ruleSetsMap = new HashMap<>(numRuleSets * 2 + 1); defaultRuleSet = null; // Used to count the number of public rule sets // Public rule sets have names that begin with % instead of %%. int publicRuleSetCount = 0; // divide up the descriptions into individual rule-set descriptions // and store them in a temporary array. At each step, we also // new up a rule set, but all this does is initialize its name // and remove it from its description. We can't actually parse // the rest of the descriptions and finish initializing everything // because we have to know the names and locations of all the rule // sets before we can actually set everything up String[] ruleSetDescriptions = new String[numRuleSets]; int curRuleSet = 0; int start = 0; while (curRuleSet < ruleSets.length) { p = descBuf.indexOf(";%", start); if (p < 0) { p = descBuf.length() - 1; } ruleSetDescriptions[curRuleSet] = descBuf.substring(start, p + 1); NFRuleSet ruleSet = new NFRuleSet(this, ruleSetDescriptions, curRuleSet); ruleSets[curRuleSet] = ruleSet; String currentName = ruleSet.getName(); ruleSetsMap.put(currentName, ruleSet); if (!currentName.startsWith("%%")) { ++publicRuleSetCount; if (defaultRuleSet == null && currentName.equals("%spellout-numbering") || currentName.equals("%digits-ordinal") || currentName.equals("%duration")) { defaultRuleSet = ruleSet; } } ++curRuleSet; start = p + 1; } // now we can take note of the formatter's default rule set, which // is the last public rule set in the description (it's the last // rather than the first so that a user can create a new formatter // from an existing formatter and change its default behavior just // by appending more rule sets to the end) // {dlf} Initialization of a fraction rule set requires the default rule // set to be known. For purposes of initialization, this is always the // last public rule set, no matter what the localization data says. // Set the default ruleset to the last public ruleset, unless one of the predefined // ruleset names %spellout-numbering, %digits-ordinal, or %duration is found if (defaultRuleSet == null) { for (int i = ruleSets.length - 1; i >= 0; --i) { if (!ruleSets[i].getName().startsWith("%%")) { defaultRuleSet = ruleSets[i]; break; } } } if (defaultRuleSet == null) { defaultRuleSet = ruleSets[ruleSets.length - 1]; } // finally, we can go back through the temporary descriptions // list and finish setting up the substructure for (int i = 0; i < ruleSets.length; i++) { ruleSets[i].parseRules(ruleSetDescriptions[i]); } // Now that the rules are initialized, the 'real' default rule // set can be adjusted by the localization data. // prepare an array of the proper size and copy the names into it String[] publicRuleSetTemp = new String[publicRuleSetCount]; publicRuleSetCount = 0; for (int i = ruleSets.length - 1; i >= 0; i--) { if (!ruleSets[i].getName().startsWith("%%")) { publicRuleSetTemp[publicRuleSetCount++] = ruleSets[i].getName(); } } if (publicRuleSetNames != null) { // confirm the names, if any aren't in the rules, that's an error // it is ok if the rules contain public rule sets that are not in this list loop: for (int i = 0; i < publicRuleSetNames.length; ++i) { String name = publicRuleSetNames[i]; for (int j = 0; j < publicRuleSetTemp.length; ++j) { if (name.equals(publicRuleSetTemp[j])) { continue loop; } } throw new IllegalArgumentException("did not find public rule set: " + name); } defaultRuleSet = findRuleSet(publicRuleSetNames[0]); // might be different } else { publicRuleSetNames = publicRuleSetTemp; } } /** * Take the localizations array and create a Map from the locale strings to * the localization arrays. */ private void initLocalizations(String[][] localizations) { if (localizations != null) { publicRuleSetNames = localizations[0].clone(); Map m = new HashMap<>(); for (int i = 1; i < localizations.length; ++i) { String[] data = localizations[i]; String loc = data[0]; String[] names = new String[data.length-1]; if (names.length != publicRuleSetNames.length) { throw new IllegalArgumentException("public name length: " + publicRuleSetNames.length + " != localized names[" + i + "] length: " + names.length); } System.arraycopy(data, 1, names, 0, names.length); m.put(loc, names); } if (!m.isEmpty()) { ruleSetDisplayNames = m; } } } /** * Set capitalizationForListOrMenu, capitalizationForStandAlone */ private void initCapitalizationContextInfo(ULocale theLocale) { ICUResourceBundle rb = (ICUResourceBundle) UResourceBundle.getBundleInstance(ICUData.ICU_BASE_NAME, theLocale); try { ICUResourceBundle rdb = rb.getWithFallback("contextTransforms/number-spellout"); int[] intVector = rdb.getIntVector(); if (intVector.length >= 2) { capitalizationForListOrMenu = (intVector[0] != 0); capitalizationForStandAlone = (intVector[1] != 0); } } catch (MissingResourceException e) { // use default } } /** * This function is used by init() to strip whitespace between rules (i.e., * after semicolons). * @param description The formatter description * @return The description with all the whitespace that follows semicolons * taken out. */ private StringBuilder stripWhitespace(String description) { // since we don't have a method that deletes characters (why?!!) // create a new StringBuffer to copy the text into StringBuilder result = new StringBuilder(); int descriptionLength = description.length(); // iterate through the characters... int start = 0; while (start < descriptionLength) { // seek to the first non-whitespace character... while (start < descriptionLength && PatternProps.isWhiteSpace(description.charAt(start))) { ++start; } //if the first non-whitespace character is semicolon, skip it and continue if (start < descriptionLength && description.charAt(start) == ';') { start += 1; continue; } // locate the next semicolon in the text and copy the text from // our current position up to that semicolon into the result int p = description.indexOf(';', start); if (p == -1) { // or if we don't find a semicolon, just copy the rest of // the string into the result result.append(description.substring(start)); break; } else if (p < descriptionLength) { result.append(description.substring(start, p + 1)); start = p + 1; } else { // when we get here, we've seeked off the end of the string, and // we terminate the loop (we continue until *start* is -1 rather // than until *p* is -1, because otherwise we'd miss the last // rule in the description) break; } } return result; } //----------------------------------------------------------------------- // formatting implementation //----------------------------------------------------------------------- /** * Bottleneck through which all the public format() methods * that take a double pass. By the time we get here, we know * which rule set we're using to do the formatting. * @param number The number to format * @param ruleSet The rule set to use to format the number * @return The text that resulted from formatting the number */ private String format(double number, NFRuleSet ruleSet) { // all API format() routines that take a double vector through // here. Create an empty string buffer where the result will // be built, and pass it to the rule set (along with an insertion // position of 0 and the number being formatted) to the rule set // for formatting StringBuilder result = new StringBuilder(); if (getRoundingMode() != BigDecimal.ROUND_UNNECESSARY && !Double.isNaN(number) && !Double.isInfinite(number)) { // We convert to a string because BigDecimal insists on excessive precision. number = new BigDecimal(Double.toString(number)).setScale(getMaximumFractionDigits(), roundingMode).doubleValue(); } ruleSet.format(number, result, 0, 0); postProcess(result, ruleSet); return result.toString(); } /** * Bottleneck through which all the public format() methods * that take a long pass. By the time we get here, we know * which rule set we're using to do the formatting. * @param number The number to format * @param ruleSet The rule set to use to format the number * @return The text that resulted from formatting the number */ private String format(long number, NFRuleSet ruleSet) { // all API format() routines that take a double vector through // here. We have these two identical functions-- one taking a // double and one taking a long-- the couple digits of precision // that long has but double doesn't (both types are 8 bytes long, // but double has to borrow some of the mantissa bits to hold // the exponent). // Create an empty string buffer where the result will // be built, and pass it to the rule set (along with an insertion // position of 0 and the number being formatted) to the rule set // for formatting StringBuilder result = new StringBuilder(); if (number == Long.MIN_VALUE) { // We can't handle this value right now. Provide an accurate default value. result.append(getDecimalFormat().format(Long.MIN_VALUE)); } else { ruleSet.format(number, result, 0, 0); } postProcess(result, ruleSet); return result.toString(); } /** * Post-process the rules if we have a post-processor. */ private void postProcess(StringBuilder result, NFRuleSet ruleSet) { if (postProcessRules != null) { if (postProcessor == null) { int ix = postProcessRules.indexOf(";"); if (ix == -1) { ix = postProcessRules.length(); } String ppClassName = postProcessRules.substring(0, ix).trim(); try { Class cls = Class.forName(ppClassName); postProcessor = (RBNFPostProcessor)cls.newInstance(); postProcessor.init(this, postProcessRules); } catch (Exception e) { // if debug, print it out if (DEBUG) System.out.println("could not locate " + ppClassName + ", error " + e.getClass().getName() + ", " + e.getMessage()); postProcessor = null; postProcessRules = null; // don't try again return; } } postProcessor.process(result, ruleSet); } } /** * Adjust capitalization of formatted result for display context */ private String adjustForContext(String result) { DisplayContext capitalization = getContext(DisplayContext.Type.CAPITALIZATION); if (capitalization != DisplayContext.CAPITALIZATION_NONE && result != null && result.length() > 0 && UCharacter.isLowerCase(result.codePointAt(0))) { if ( capitalization==DisplayContext.CAPITALIZATION_FOR_BEGINNING_OF_SENTENCE || (capitalization == DisplayContext.CAPITALIZATION_FOR_UI_LIST_OR_MENU && capitalizationForListOrMenu) || (capitalization == DisplayContext.CAPITALIZATION_FOR_STANDALONE && capitalizationForStandAlone) ) { if (capitalizationBrkIter == null) { // should only happen when deserializing, etc. capitalizationBrkIter = BreakIterator.getSentenceInstance(locale); } return UCharacter.toTitleCase(locale, result, capitalizationBrkIter, UCharacter.TITLECASE_NO_LOWERCASE | UCharacter.TITLECASE_NO_BREAK_ADJUSTMENT); } } return result; } /** * Returns the named rule set. Throws an IllegalArgumentException * if this formatter doesn't have a rule set with that name. * @param name The name of the desired rule set * @return The rule set with that name */ NFRuleSet findRuleSet(String name) throws IllegalArgumentException { NFRuleSet result = ruleSetsMap.get(name); if (result == null) { throw new IllegalArgumentException("No rule set named " + name); } return result; } }





© 2015 - 2024 Weber Informatics LLC | Privacy Policy