All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.ibm.icu.text.Bidi Maven / Gradle / Ivy

There is a newer version: 4.15.102
Show newest version
// © 2016 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html#License
/*
*******************************************************************************
*   Copyright (C) 2001-2016, International Business Machines
*   Corporation and others.  All Rights Reserved.
*******************************************************************************
*/

/* FOOD FOR THOUGHT: currently the reordering modes are a mixture of
 * algorithm for direct BiDi, algorithm for inverse Bidi and the bizarre
 * concept of RUNS_ONLY which is a double operation.
 * It could be advantageous to divide this into 3 concepts:
 * a) Operation: direct / inverse / RUNS_ONLY
 * b) Direct algorithm: default / NUMBERS_SPECIAL / GROUP_NUMBERS_WITH_L
 * c) Inverse algorithm: default / INVERSE_LIKE_DIRECT / NUMBERS_SPECIAL
 * This would allow combinations not possible today like RUNS_ONLY with
 * NUMBERS_SPECIAL.
 * Also allow to set INSERT_MARKS for the direct step of RUNS_ONLY and
 * REMOVE_CONTROLS for the inverse step.
 * Not all combinations would be supported, and probably not all do make sense.
 * This would need to document which ones are supported and what are the
 * fallbacks for unsupported combinations.
 */

//TODO: make sample program do something simple but real and complete

package com.ibm.icu.text;

import java.awt.font.NumericShaper;
import java.awt.font.TextAttribute;
import java.lang.reflect.Array;
import java.text.AttributedCharacterIterator;
import java.util.Arrays;

import com.ibm.icu.impl.UBiDiProps;
import com.ibm.icu.lang.UCharacter;
import com.ibm.icu.lang.UCharacterDirection;
import com.ibm.icu.lang.UProperty;

/**
 *
 * 

Bidi algorithm for ICU

* * This is an implementation of the Unicode Bidirectional Algorithm. The * algorithm is defined in the Unicode Standard Annex #9. *

* * Note: Libraries that perform a bidirectional algorithm and reorder strings * accordingly are sometimes called "Storage Layout Engines". ICU's Bidi and * shaping (ArabicShaping) classes can be used at the core of such "Storage * Layout Engines". * *

General remarks about the API:

* * The "limit" of a sequence of characters is the position just after * their last character, i.e., one more than that position. *

* * Some of the API methods provide access to "runs". Such a * "run" is defined as a sequence of characters that are at the same * embedding level after performing the Bidi algorithm. * *

Basic concept: paragraph

* A piece of text can be divided into several paragraphs by characters * with the Bidi class Block Separator. For handling of * paragraphs, see: *
    *
  • {@link #countParagraphs} *
  • {@link #getParaLevel} *
  • {@link #getParagraph} *
  • {@link #getParagraphByIndex} *
* *

Basic concept: text direction

* The direction of a piece of text may be: *
    *
  • {@link #LTR} *
  • {@link #RTL} *
  • {@link #MIXED} *
  • {@link #NEUTRAL} *
* *

Basic concept: levels

* * Levels in this API represent embedding levels according to the Unicode * Bidirectional Algorithm. * Their low-order bit (even/odd value) indicates the visual direction.

* * Levels can be abstract values when used for the * paraLevel and embeddingLevels * arguments of setPara(); there: *

    *
  • the high-order bit of an embeddingLevels[] * value indicates whether the using application is * specifying the level of a character to override whatever the * Bidi implementation would resolve it to.
  • *
  • paraLevel can be set to the * pseudo-level values LEVEL_DEFAULT_LTR * and LEVEL_DEFAULT_RTL.
  • *
* *

The related constants are not real, valid level values. * DEFAULT_XXX can be used to specify * a default for the paragraph level for * when the setPara() method * shall determine it but there is no * strongly typed character in the input.

* * Note that the value for LEVEL_DEFAULT_LTR is even * and the one for LEVEL_DEFAULT_RTL is odd, * just like with normal LTR and RTL level values - * these special values are designed that way. Also, the implementation * assumes that MAX_EXPLICIT_LEVEL is odd. * *

Note: The numeric values of the related constants will not change: * They are tied to the use of 7-bit byte values (plus the override bit) * and of the byte data type in this API. * *

See Also: *

    *
  • {@link #LEVEL_DEFAULT_LTR} *
  • {@link #LEVEL_DEFAULT_RTL} *
  • {@link #LEVEL_OVERRIDE} *
  • {@link #MAX_EXPLICIT_LEVEL} *
  • {@link #setPara} *
* *

Basic concept: Reordering Mode

* Reordering mode values indicate which variant of the Bidi algorithm to * use. * * See Also: *
    *
  • {@link #setReorderingMode} *
  • {@link #REORDER_DEFAULT} *
  • {@link #REORDER_NUMBERS_SPECIAL} *
  • {@link #REORDER_GROUP_NUMBERS_WITH_R} *
  • {@link #REORDER_RUNS_ONLY} *
  • {@link #REORDER_INVERSE_NUMBERS_AS_L} *
  • {@link #REORDER_INVERSE_LIKE_DIRECT} *
  • {@link #REORDER_INVERSE_FOR_NUMBERS_SPECIAL} *
* *

Basic concept: Reordering Options

* Reordering options can be applied during Bidi text transformations. * * See Also: *
    *
  • {@link #setReorderingOptions} *
  • {@link #OPTION_DEFAULT} *
  • {@link #OPTION_INSERT_MARKS} *
  • {@link #OPTION_REMOVE_CONTROLS} *
  • {@link #OPTION_STREAMING} *
* *

Sample code for the ICU Bidi API

* *
Rendering a paragraph with the ICU Bidi API
* * This is (hypothetical) sample code that illustrates how the ICU Bidi API * could be used to render a paragraph of text. Rendering code depends highly on * the graphics system, therefore this sample code must make a lot of * assumptions, which may or may not match any existing graphics system's * properties. * *

* The basic assumptions are: * *

    *
  • Rendering is done from left to right on a horizontal line.
  • *
  • A run of single-style, unidirectional text can be rendered at once. *
  • *
  • Such a run of text is passed to the graphics system with characters * (code units) in logical order.
  • *
  • The line-breaking algorithm is very complicated and Locale-dependent - * and therefore its implementation omitted from this sample code.
  • *
* *
 *
 *  package com.ibm.icu.dev.test.bidi;
 *
 *  import com.ibm.icu.text.Bidi;
 *  import com.ibm.icu.text.BidiRun;
 *
 *  public class Sample {
 *
 *      static final int styleNormal = 0;
 *      static final int styleSelected = 1;
 *      static final int styleBold = 2;
 *      static final int styleItalics = 4;
 *      static final int styleSuper=8;
 *      static final int styleSub = 16;
 *
 *      static class StyleRun {
 *          int limit;
 *          int style;
 *
 *          public StyleRun(int limit, int style) {
 *              this.limit = limit;
 *              this.style = style;
 *          }
 *      }
 *
 *      static class Bounds {
 *          int start;
 *          int limit;
 *
 *          public Bounds(int start, int limit) {
 *              this.start = start;
 *              this.limit = limit;
 *          }
 *      }
 *
 *      static int getTextWidth(String text, int start, int limit,
 *                              StyleRun[] styleRuns, int styleRunCount) {
 *          // simplistic way to compute the width
 *          return limit - start;
 *      }
 *
 *      // set limit and StyleRun limit for a line
 *      // from text[start] and from styleRuns[styleRunStart]
 *      // using Bidi.getLogicalRun(...)
 *      // returns line width
 *      static int getLineBreak(String text, Bounds line, Bidi para,
 *                              StyleRun styleRuns[], Bounds styleRun) {
 *          // dummy return
 *          return 0;
 *      }
 *
 *      // render runs on a line sequentially, always from left to right
 *
 *      // prepare rendering a new line
 *      static void startLine(byte textDirection, int lineWidth) {
 *          System.out.println();
 *      }
 *
 *      // render a run of text and advance to the right by the run width
 *      // the text[start..limit-1] is always in logical order
 *      static void renderRun(String text, int start, int limit,
 *                            byte textDirection, int style) {
 *      }
 *
 *      // We could compute a cross-product
 *      // from the style runs with the directional runs
 *      // and then reorder it.
 *      // Instead, here we iterate over each run type
 *      // and render the intersections -
 *      // with shortcuts in simple (and common) cases.
 *      // renderParagraph() is the main function.
 *
 *      // render a directional run with
 *      // (possibly) multiple style runs intersecting with it
 *      static void renderDirectionalRun(String text, int start, int limit,
 *                                       byte direction, StyleRun styleRuns[],
 *                                       int styleRunCount) {
 *          int i;
 *
 *          // iterate over style runs
 *          if (direction == Bidi.LTR) {
 *              int styleLimit;
 *              for (i = 0; i < styleRunCount; ++i) {
 *                  styleLimit = styleRuns[i].limit;
 *                  if (start < styleLimit) {
 *                      if (styleLimit > limit) {
 *                          styleLimit = limit;
 *                      }
 *                      renderRun(text, start, styleLimit,
 *                                direction, styleRuns[i].style);
 *                      if (styleLimit == limit) {
 *                          break;
 *                      }
 *                      start = styleLimit;
 *                  }
 *              }
 *          } else {
 *              int styleStart;
 *
 *              for (i = styleRunCount-1; i >= 0; --i) {
 *                  if (i > 0) {
 *                      styleStart = styleRuns[i-1].limit;
 *                  } else {
 *                      styleStart = 0;
 *                  }
 *                  if (limit >= styleStart) {
 *                      if (styleStart < start) {
 *                          styleStart = start;
 *                      }
 *                      renderRun(text, styleStart, limit, direction,
 *                                styleRuns[i].style);
 *                      if (styleStart == start) {
 *                          break;
 *                      }
 *                      limit = styleStart;
 *                  }
 *              }
 *          }
 *      }
 *
 *      // the line object represents text[start..limit-1]
 *      static void renderLine(Bidi line, String text, int start, int limit,
 *                             StyleRun styleRuns[], int styleRunCount) {
 *          byte direction = line.getDirection();
 *          if (direction != Bidi.MIXED) {
 *              // unidirectional
 *              if (styleRunCount <= 1) {
 *                  renderRun(text, start, limit, direction, styleRuns[0].style);
 *              } else {
 *                  renderDirectionalRun(text, start, limit, direction,
 *                                       styleRuns, styleRunCount);
 *              }
 *          } else {
 *              // mixed-directional
 *              int count, i;
 *              BidiRun run;
 *
 *              try {
 *                  count = line.countRuns();
 *              } catch (IllegalStateException e) {
 *                  e.printStackTrace();
 *                  return;
 *              }
 *              if (styleRunCount <= 1) {
 *                  int style = styleRuns[0].style;
 *
 *                  // iterate over directional runs
 *                  for (i = 0; i < count; ++i) {
 *                      run = line.getVisualRun(i);
 *                      renderRun(text, run.getStart(), run.getLimit(),
 *                                run.getDirection(), style);
 *                  }
 *              } else {
 *                  // iterate over both directional and style runs
 *                  for (i = 0; i < count; ++i) {
 *                      run = line.getVisualRun(i);
 *                      renderDirectionalRun(text, run.getStart(),
 *                                           run.getLimit(), run.getDirection(),
 *                                           styleRuns, styleRunCount);
 *                  }
 *              }
 *          }
 *      }
 *
 *      static void renderParagraph(String text, byte textDirection,
 *                                  StyleRun styleRuns[], int styleRunCount,
 *                                  int lineWidth) {
 *          int length = text.length();
 *          Bidi para = new Bidi();
 *          try {
 *              para.setPara(text,
 *                           textDirection != 0 ? Bidi.LEVEL_DEFAULT_RTL
 *                                              : Bidi.LEVEL_DEFAULT_LTR,
 *                           null);
 *          } catch (Exception e) {
 *              e.printStackTrace();
 *              return;
 *          }
 *          byte paraLevel = (byte)(1 & para.getParaLevel());
 *          StyleRun styleRun = new StyleRun(length, styleNormal);
 *
 *          if (styleRuns == null || styleRunCount <= 0) {
 *              styleRuns = new StyleRun[1];
 *              styleRunCount = 1;
 *              styleRuns[0] = styleRun;
 *          }
 *          // assume styleRuns[styleRunCount-1].limit>=length
 *
 *          int width = getTextWidth(text, 0, length, styleRuns, styleRunCount);
 *          if (width <= lineWidth) {
 *              // everything fits onto one line
 *
 *              // prepare rendering a new line from either left or right
 *              startLine(paraLevel, width);
 *
 *              renderLine(para, text, 0, length, styleRuns, styleRunCount);
 *          } else {
 *              // we need to render several lines
 *              Bidi line = new Bidi(length, 0);
 *              int start = 0, limit;
 *              int styleRunStart = 0, styleRunLimit;
 *
 *              for (;;) {
 *                  limit = length;
 *                  styleRunLimit = styleRunCount;
 *                  width = getLineBreak(text, new Bounds(start, limit),
 *                                       para, styleRuns,
 *                                       new Bounds(styleRunStart, styleRunLimit));
 *                  try {
 *                      line = para.setLine(start, limit);
 *                  } catch (Exception e) {
 *                      e.printStackTrace();
 *                      return;
 *                  }
 *                  // prepare rendering a new line
 *                  // from either left or right
 *                  startLine(paraLevel, width);
 *
 *                  if (styleRunStart > 0) {
 *                      int newRunCount = styleRuns.length - styleRunStart;
 *                      StyleRun[] newRuns = new StyleRun[newRunCount];
 *                      System.arraycopy(styleRuns, styleRunStart, newRuns, 0,
 *                                       newRunCount);
 *                      renderLine(line, text, start, limit, newRuns,
 *                                 styleRunLimit - styleRunStart);
 *                  } else {
 *                      renderLine(line, text, start, limit, styleRuns,
 *                                 styleRunLimit - styleRunStart);
 *                  }
 *                  if (limit == length) {
 *                      break;
 *                  }
 *                  start = limit;
 *                  styleRunStart = styleRunLimit - 1;
 *                  if (start >= styleRuns[styleRunStart].limit) {
 *                      ++styleRunStart;
 *                  }
 *              }
 *          }
 *      }
 *
 *      public static void main(String[] args)
 *      {
 *          renderParagraph("Some Latin text...", Bidi.LTR, null, 0, 80);
 *          renderParagraph("Some Hebrew text...", Bidi.RTL, null, 0, 60);
 *      }
 *  }
 *
 * 
* * @author Simon Montagu, Matitiahu Allouche (ported from C code written by Markus W. Scherer) * @stable ICU 3.8 */ /* * General implementation notes: * * Throughout the implementation, there are comments like (W2) that refer to * rules of the BiDi algorithm, in this example to the second rule of the * resolution of weak types. * * For handling surrogate pairs, where two UChar's form one "abstract" (or UTF-32) * character according to UTF-16, the second UChar gets the directional property of * the entire character assigned, while the first one gets a BN, a boundary * neutral, type, which is ignored by most of the algorithm according to * rule (X9) and the implementation suggestions of the BiDi algorithm. * * Later, adjustWSLevels() will set the level for each BN to that of the * following character (UChar), which results in surrogate pairs getting the * same level on each of their surrogates. * * In a UTF-8 implementation, the same thing could be done: the last byte of * a multi-byte sequence would get the "real" property, while all previous * bytes of that sequence would get BN. * * It is not possible to assign all those parts of a character the same real * property because this would fail in the resolution of weak types with rules * that look at immediately surrounding types. * * As a related topic, this implementation does not remove Boundary Neutral * types from the input, but ignores them wherever this is relevant. * For example, the loop for the resolution of the weak types reads * types until it finds a non-BN. * Also, explicit embedding codes are neither changed into BN nor removed. * They are only treated the same way real BNs are. * As stated before, adjustWSLevels() takes care of them at the end. * For the purpose of conformance, the levels of all these codes * do not matter. * * Note that this implementation modifies the dirProps * after the initial setup, when applying X5c (replace FSI by LRI or RLI), * X6, N0 (replace paired brackets by L or R). * * In this implementation, the resolution of weak types (W1 to W6), * neutrals (N1 and N2), and the assignment of the resolved level (In) * are all done in one single loop, in resolveImplicitLevels(). * Changes of dirProp values are done on the fly, without writing * them back to the dirProps array. * * * This implementation contains code that allows to bypass steps of the * algorithm that are not needed on the specific paragraph * in order to speed up the most common cases considerably, * like text that is entirely LTR, or RTL text without numbers. * * Most of this is done by setting a bit for each directional property * in a flags variable and later checking for whether there are * any LTR characters or any RTL characters, or both, whether * there are any explicit embedding codes, etc. * * If the (Xn) steps are performed, then the flags are re-evaluated, * because they will then not contain the embedding codes any more * and will be adjusted for override codes, so that subsequently * more bypassing may be possible than what the initial flags suggested. * * If the text is not mixed-directional, then the * algorithm steps for the weak type resolution are not performed, * and all levels are set to the paragraph level. * * If there are no explicit embedding codes, then the (Xn) steps * are not performed. * * If embedding levels are supplied as a parameter, then all * explicit embedding codes are ignored, and the (Xn) steps * are not performed. * * White Space types could get the level of the run they belong to, * and are checked with a test of (flags&MASK_EMBEDDING) to * consider if the paragraph direction should be considered in * the flags variable. * * If there are no White Space types in the paragraph, then * (L1) is not necessary in adjustWSLevels(). */ public class Bidi { static class Point { int pos; /* position in text */ int flag; /* flag for LRM/RLM, before/after */ } static class InsertPoints { int size; int confirmed; Point[] points = new Point[0]; } static class Opening { int position; /* position of opening bracket */ int match; /* matching char or -position of closing bracket */ int contextPos; /* position of last strong char found before opening */ short flags; /* bits for L or R/AL found within the pair */ byte contextDir; /* L or R according to last strong char before opening */ } static class IsoRun { int contextPos; /* position of char determining context */ short start; /* index of first opening entry for this run */ short limit; /* index after last opening entry for this run */ byte level; /* level of this run */ byte lastStrong; /* bidi class of last strong char found in this run */ byte lastBase; /* bidi class of last base char found in this run */ byte contextDir; /* L or R to use as context for following openings */ } static class BracketData { Opening[] openings = new Opening[SIMPLE_OPENINGS_COUNT]; int isoRunLast; /* index of last used entry */ /* array of nested isolated sequence entries; can never excess UBIDI_MAX_EXPLICIT_LEVEL + 1 for index 0, + 1 for before the first isolated sequence */ IsoRun[] isoRuns = new IsoRun[MAX_EXPLICIT_LEVEL+2]; boolean isNumbersSpecial; /*reordering mode for NUMBERS_SPECIAL */ } static class Isolate { int startON; int start1; short stateImp; short state; } /** Paragraph level setting

* * Constant indicating that the base direction depends on the first strong * directional character in the text according to the Unicode Bidirectional * Algorithm. If no strong directional character is present, * then set the paragraph level to 0 (left-to-right).

* * If this value is used in conjunction with reordering modes * REORDER_INVERSE_LIKE_DIRECT or * REORDER_INVERSE_FOR_NUMBERS_SPECIAL, the text to reorder * is assumed to be visual LTR, and the text after reordering is required * to be the corresponding logical string with appropriate contextual * direction. The direction of the result string will be RTL if either * the rightmost or leftmost strong character of the source text is RTL * or Arabic Letter, the direction will be LTR otherwise.

* * If reordering option OPTION_INSERT_MARKS is set, an RLM may * be added at the beginning of the result string to ensure round trip * (that the result string, when reordered back to visual, will produce * the original source text). * @see #REORDER_INVERSE_LIKE_DIRECT * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL * @stable ICU 3.8 */ public static final byte LEVEL_DEFAULT_LTR = (byte)0x7e; /** Paragraph level setting

* * Constant indicating that the base direction depends on the first strong * directional character in the text according to the Unicode Bidirectional * Algorithm. If no strong directional character is present, * then set the paragraph level to 1 (right-to-left).

* * If this value is used in conjunction with reordering modes * REORDER_INVERSE_LIKE_DIRECT or * REORDER_INVERSE_FOR_NUMBERS_SPECIAL, the text to reorder * is assumed to be visual LTR, and the text after reordering is required * to be the corresponding logical string with appropriate contextual * direction. The direction of the result string will be RTL if either * the rightmost or leftmost strong character of the source text is RTL * or Arabic Letter, or if the text contains no strong character; * the direction will be LTR otherwise.

* * If reordering option OPTION_INSERT_MARKS is set, an RLM may * be added at the beginning of the result string to ensure round trip * (that the result string, when reordered back to visual, will produce * the original source text). * @see #REORDER_INVERSE_LIKE_DIRECT * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL * @stable ICU 3.8 */ public static final byte LEVEL_DEFAULT_RTL = (byte)0x7f; /** * Maximum explicit embedding level. * Same as the max_depth value in the * Unicode Bidirectional Algorithm. * (The maximum resolved level can be up to MAX_EXPLICIT_LEVEL+1). * @stable ICU 3.8 */ public static final byte MAX_EXPLICIT_LEVEL = 125; /** * Bit flag for level input. * Overrides directional properties. * @stable ICU 3.8 */ public static final byte LEVEL_OVERRIDE = (byte)0x80; /** * Special value which can be returned by the mapping methods when a * logical index has no corresponding visual index or vice-versa. This may * happen for the logical-to-visual mapping of a Bidi control when option * OPTION_REMOVE_CONTROLS is * specified. This can also happen for the visual-to-logical mapping of a * Bidi mark (LRM or RLM) inserted by option * OPTION_INSERT_MARKS. * @see #getVisualIndex * @see #getVisualMap * @see #getLogicalIndex * @see #getLogicalMap * @see #OPTION_INSERT_MARKS * @see #OPTION_REMOVE_CONTROLS * @stable ICU 3.8 */ public static final int MAP_NOWHERE = -1; /** * Left-to-right text. *

    *
  • As return value for getDirection(), it means * that the source string contains no right-to-left characters, or * that the source string is empty and the paragraph level is even. *
  • As return value for getBaseDirection(), it * means that the first strong character of the source string has * a left-to-right direction. *
* @stable ICU 3.8 */ public static final byte LTR = 0; /** * Right-to-left text. *
    *
  • As return value for getDirection(), it means * that the source string contains no left-to-right characters, or * that the source string is empty and the paragraph level is odd. *
  • As return value for getBaseDirection(), it * means that the first strong character of the source string has * a right-to-left direction. *
* @stable ICU 3.8 */ public static final byte RTL = 1; /** * Mixed-directional text. *

As return value for getDirection(), it means * that the source string contains both left-to-right and * right-to-left characters. * @stable ICU 3.8 */ public static final byte MIXED = 2; /** * No strongly directional text. *

As return value for getBaseDirection(), it means * that the source string is missing or empty, or contains neither * left-to-right nor right-to-left characters. * @stable ICU 4.6 */ public static final byte NEUTRAL = 3; /** * option bit for writeReordered(): * keep combining characters after their base characters in RTL runs * * @see #writeReordered * @stable ICU 3.8 */ public static final short KEEP_BASE_COMBINING = 1; /** * option bit for writeReordered(): * replace characters with the "mirrored" property in RTL runs * by their mirror-image mappings * * @see #writeReordered * @stable ICU 3.8 */ public static final short DO_MIRRORING = 2; /** * option bit for writeReordered(): * surround the run with LRMs if necessary; * this is part of the approximate "inverse Bidi" algorithm * *

This option does not imply corresponding adjustment of the index * mappings. * * @see #setInverse * @see #writeReordered * @stable ICU 3.8 */ public static final short INSERT_LRM_FOR_NUMERIC = 4; /** * option bit for writeReordered(): * remove Bidi control characters * (this does not affect INSERT_LRM_FOR_NUMERIC) * *

This option does not imply corresponding adjustment of the index * mappings. * * @see #writeReordered * @see #INSERT_LRM_FOR_NUMERIC * @stable ICU 3.8 */ public static final short REMOVE_BIDI_CONTROLS = 8; /** * option bit for writeReordered(): * write the output in reverse order * *

This has the same effect as calling writeReordered() * first without this option, and then calling * writeReverse() without mirroring. * Doing this in the same step is faster and avoids a temporary buffer. * An example for using this option is output to a character terminal that * is designed for RTL scripts and stores text in reverse order. * * @see #writeReordered * @stable ICU 3.8 */ public static final short OUTPUT_REVERSE = 16; /** Reordering mode: Regular Logical to Visual Bidi algorithm according to Unicode. * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_DEFAULT = 0; /** Reordering mode: Logical to Visual algorithm which handles numbers in * a way which mimicks the behavior of Windows XP. * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_NUMBERS_SPECIAL = 1; /** Reordering mode: Logical to Visual algorithm grouping numbers with * adjacent R characters (reversible algorithm). * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_GROUP_NUMBERS_WITH_R = 2; /** Reordering mode: Reorder runs only to transform a Logical LTR string * to the logical RTL string with the same display, or vice-versa.
* If this mode is set together with option * OPTION_INSERT_MARKS, some Bidi controls in the source * text may be removed and other controls may be added to produce the * minimum combination which has the required display. * @see #OPTION_INSERT_MARKS * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_RUNS_ONLY = 3; /** Reordering mode: Visual to Logical algorithm which handles numbers * like L (same algorithm as selected by setInverse(true). * @see #setInverse * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_INVERSE_NUMBERS_AS_L = 4; /** Reordering mode: Visual to Logical algorithm equivalent to the regular * Logical to Visual algorithm. * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_INVERSE_LIKE_DIRECT = 5; /** Reordering mode: Inverse Bidi (Visual to Logical) algorithm for the * REORDER_NUMBERS_SPECIAL Bidi algorithm. * @see #setReorderingMode * @stable ICU 3.8 */ public static final short REORDER_INVERSE_FOR_NUMBERS_SPECIAL = 6; /* Number of values for reordering mode. */ static final short REORDER_COUNT = 7; /* Reordering mode values must be ordered so that all the regular logical to * visual modes come first, and all inverse Bidi modes come last. */ static final short REORDER_LAST_LOGICAL_TO_VISUAL = REORDER_NUMBERS_SPECIAL; /** * Option value for setReorderingOptions: * disable all the options which can be set with this method * @see #setReorderingOptions * @stable ICU 3.8 */ public static final int OPTION_DEFAULT = 0; /** * Option bit for setReorderingOptions: * insert Bidi marks (LRM or RLM) when needed to ensure correct result of * a reordering to a Logical order * *

This option must be set or reset before calling * setPara. * *

This option is significant only with reordering modes which generate * a result with Logical order, specifically. *

    *
  • REORDER_RUNS_ONLY
  • *
  • REORDER_INVERSE_NUMBERS_AS_L
  • *
  • REORDER_INVERSE_LIKE_DIRECT
  • *
  • REORDER_INVERSE_FOR_NUMBERS_SPECIAL
  • *
* *

If this option is set in conjunction with reordering mode * REORDER_INVERSE_NUMBERS_AS_L or with calling * setInverse(true), it implies option * INSERT_LRM_FOR_NUMERIC in calls to method * writeReordered(). * *

For other reordering modes, a minimum number of LRM or RLM characters * will be added to the source text after reordering it so as to ensure * round trip, i.e. when applying the inverse reordering mode on the * resulting logical text with removal of Bidi marks * (option OPTION_REMOVE_CONTROLS set before calling * setPara() or option * REMOVE_BIDI_CONTROLS in * writeReordered), the result will be identical to the * source text in the first transformation. * *

This option will be ignored if specified together with option * OPTION_REMOVE_CONTROLS. It inhibits option * REMOVE_BIDI_CONTROLS in calls to method * writeReordered() and it implies option * INSERT_LRM_FOR_NUMERIC in calls to method * writeReordered() if the reordering mode is * REORDER_INVERSE_NUMBERS_AS_L. * * @see #setReorderingMode * @see #setReorderingOptions * @see #INSERT_LRM_FOR_NUMERIC * @see #REMOVE_BIDI_CONTROLS * @see #OPTION_REMOVE_CONTROLS * @see #REORDER_RUNS_ONLY * @see #REORDER_INVERSE_NUMBERS_AS_L * @see #REORDER_INVERSE_LIKE_DIRECT * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL * @stable ICU 3.8 */ public static final int OPTION_INSERT_MARKS = 1; /** * Option bit for setReorderingOptions: * remove Bidi control characters * *

This option must be set or reset before calling * setPara. * *

This option nullifies option * OPTION_INSERT_MARKS. It inhibits option * INSERT_LRM_FOR_NUMERIC in calls to method * writeReordered() and it implies option * REMOVE_BIDI_CONTROLS in calls to that method. * * @see #setReorderingMode * @see #setReorderingOptions * @see #OPTION_INSERT_MARKS * @see #INSERT_LRM_FOR_NUMERIC * @see #REMOVE_BIDI_CONTROLS * @stable ICU 3.8 */ public static final int OPTION_REMOVE_CONTROLS = 2; /** * Option bit for setReorderingOptions: * process the output as part of a stream to be continued * *

This option must be set or reset before calling * setPara. * *

This option specifies that the caller is interested in processing * large text object in parts. The results of the successive calls are * expected to be concatenated by the caller. Only the call for the last * part will have this option bit off. * *

When this option bit is on, setPara() may process * less than the full source text in order to truncate the text at a * meaningful boundary. The caller should call * getProcessedLength() immediately after calling * setPara() in order to determine how much of the source * text has been processed. Source text beyond that length should be * resubmitted in following calls to setPara. The * processed length may be less than the length of the source text if a * character preceding the last character of the source text constitutes a * reasonable boundary (like a block separator) for text to be continued.
* If the last character of the source text constitutes a reasonable * boundary, the whole text will be processed at once.
* If nowhere in the source text there exists * such a reasonable boundary, the processed length will be zero.
* The caller should check for such an occurrence and do one of the following: *

  • submit a larger amount of text with a better chance to include * a reasonable boundary.
  • *
  • resubmit the same text after turning off option * OPTION_STREAMING.
* In all cases, this option should be turned off before processing the last * part of the text. * *

When the OPTION_STREAMING option is used, it is * recommended to call orderParagraphsLTR(true) before calling * setPara() so that later paragraphs may be concatenated to * previous paragraphs on the right. * * @see #setReorderingMode * @see #setReorderingOptions * @see #getProcessedLength * @stable ICU 3.8 */ public static final int OPTION_STREAMING = 4; /* * Comparing the description of the Bidi algorithm with this implementation * is easier with the same names for the Bidi types in the code as there. * See UCharacterDirection */ static final byte L = UCharacterDirection.LEFT_TO_RIGHT; /* 0 */ static final byte R = UCharacterDirection.RIGHT_TO_LEFT; /* 1 */ static final byte EN = UCharacterDirection.EUROPEAN_NUMBER; /* 2 */ static final byte ES = UCharacterDirection.EUROPEAN_NUMBER_SEPARATOR; /* 3 */ static final byte ET = UCharacterDirection.EUROPEAN_NUMBER_TERMINATOR; /* 4 */ static final byte AN = UCharacterDirection.ARABIC_NUMBER; /* 5 */ static final byte CS = UCharacterDirection.COMMON_NUMBER_SEPARATOR; /* 6 */ static final byte B = UCharacterDirection.BLOCK_SEPARATOR; /* 7 */ static final byte S = UCharacterDirection.SEGMENT_SEPARATOR; /* 8 */ static final byte WS = UCharacterDirection.WHITE_SPACE_NEUTRAL; /* 9 */ static final byte ON = UCharacterDirection.OTHER_NEUTRAL; /* 10 */ static final byte LRE = UCharacterDirection.LEFT_TO_RIGHT_EMBEDDING; /* 11 */ static final byte LRO = UCharacterDirection.LEFT_TO_RIGHT_OVERRIDE; /* 12 */ static final byte AL = UCharacterDirection.RIGHT_TO_LEFT_ARABIC; /* 13 */ static final byte RLE = UCharacterDirection.RIGHT_TO_LEFT_EMBEDDING; /* 14 */ static final byte RLO = UCharacterDirection.RIGHT_TO_LEFT_OVERRIDE; /* 15 */ static final byte PDF = UCharacterDirection.POP_DIRECTIONAL_FORMAT; /* 16 */ static final byte NSM = UCharacterDirection.DIR_NON_SPACING_MARK; /* 17 */ static final byte BN = UCharacterDirection.BOUNDARY_NEUTRAL; /* 18 */ static final byte FSI = UCharacterDirection.FIRST_STRONG_ISOLATE; /* 19 */ static final byte LRI = UCharacterDirection.LEFT_TO_RIGHT_ISOLATE; /* 20 */ static final byte RLI = UCharacterDirection.RIGHT_TO_LEFT_ISOLATE; /* 21 */ static final byte PDI = UCharacterDirection.POP_DIRECTIONAL_ISOLATE; /* 22 */ static final byte ENL = PDI + 1; /* EN after W7 */ /* 23 */ static final byte ENR = ENL + 1; /* EN not subject to W7 */ /* 24 */ /** * Value returned by BidiClassifier when there is no need to * override the standard Bidi class for a given code point. * *

This constant is deprecated; use UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)+1 instead. * * @see BidiClassifier * @deprecated ICU 58 The numeric value may change over time, see ICU ticket #12420. */ @Deprecated public static final int CLASS_DEFAULT = UCharacterDirection.CHAR_DIRECTION_COUNT; /* number of paras entries allocated initially */ static final int SIMPLE_PARAS_COUNT = 10; /* number of isolate run entries for paired brackets allocated initially */ static final int SIMPLE_OPENINGS_COUNT = 20; private static final char CR = '\r'; private static final char LF = '\n'; static final int LRM_BEFORE = 1; static final int LRM_AFTER = 2; static final int RLM_BEFORE = 4; static final int RLM_AFTER = 8; /* flags for Opening.flags */ static final byte FOUND_L = (byte)DirPropFlag(L); static final byte FOUND_R = (byte)DirPropFlag(R); /* * The following bit is used for the directional isolate status. * Stack entries corresponding to isolate sequences are greater than ISOLATE. */ static final int ISOLATE = 0x0100; /* * reference to parent paragraph object (reference to self if this object is * a paragraph object); set to null in a newly opened object; set to a * real value after a successful execution of setPara or setLine */ Bidi paraBidi; final UBiDiProps bdp; /* character array representing the current text */ char[] text; /* length of the current text */ int originalLength; /* if the option OPTION_STREAMING is set, this is the length of * text actually processed by setPara, which may be shorter * than the original length. Otherwise, it is identical to the original * length. */ int length; /* if option OPTION_REMOVE_CONTROLS is set, and/or Bidi * marks are allowed to be inserted in one of the reordering modes, the * length of the result string may be different from the processed length. */ int resultLength; /* indicators for whether memory may be allocated after construction */ boolean mayAllocateText; boolean mayAllocateRuns; /* arrays with one value per text-character */ byte[] dirPropsMemory = new byte[1]; byte[] levelsMemory = new byte[1]; byte[] dirProps; byte[] levels; /* are we performing an approximation of the "inverse Bidi" algorithm? */ boolean isInverse; /* are we using the basic algorithm or its variation? */ int reorderingMode; /* bitmask for reordering options */ int reorderingOptions; /* must block separators receive level 0? */ boolean orderParagraphsLTR; /* the paragraph level */ byte paraLevel; /* original paraLevel when contextual */ /* must be one of DEFAULT_xxx or 0 if not contextual */ byte defaultParaLevel; /* context data */ String prologue; String epilogue; /* the following is set in setPara, used in processPropertySeq */ ImpTabPair impTabPair; /* reference to levels state table pair */ /* the overall paragraph or line directionality*/ byte direction; /* flags is a bit set for which directional properties are in the text */ int flags; /* lastArabicPos is index to the last AL in the text, -1 if none */ int lastArabicPos; /* characters after trailingWSStart are WS and are */ /* implicitly at the paraLevel (rule (L1)) - levels may not reflect that */ int trailingWSStart; /* fields for paragraph handling, set in getDirProps() */ int paraCount; int[] paras_limit = new int[SIMPLE_PARAS_COUNT]; byte[] paras_level = new byte[SIMPLE_PARAS_COUNT]; /* fields for line reordering */ int runCount; /* ==-1: runs not set up yet */ BidiRun[] runsMemory = new BidiRun[0]; BidiRun[] runs; /* for non-mixed text, we only need a tiny array of runs (no allocation) */ BidiRun[] simpleRuns = {new BidiRun()}; /* fields for managing isolate sequences */ Isolate[] isolates; /* maximum or current nesting depth of isolate sequences */ /* Within resolveExplicitLevels() and checkExplicitLevels(), this is the maximal nesting encountered. Within resolveImplicitLevels(), this is the index of the current isolates stack entry. */ int isolateCount; /* mapping of runs in logical order to visual order */ int[] logicalToVisualRunsMap; /* flag to indicate that the map has been updated */ boolean isGoodLogicalToVisualRunsMap; /* customized class provider */ BidiClassifier customClassifier = null; /* for inverse Bidi with insertion of directional marks */ InsertPoints insertPoints = new InsertPoints(); /* for option OPTION_REMOVE_CONTROLS */ int controlCount; /* * Sometimes, bit values are more appropriate * to deal with directionality properties. * Abbreviations in these method names refer to names * used in the Bidi algorithm. */ static int DirPropFlag(byte dir) { return (1 << dir); } boolean testDirPropFlagAt(int flag, int index) { return ((DirPropFlag(dirProps[index]) & flag) != 0); } static final int DirPropFlagMultiRuns = DirPropFlag((byte)31); /* to avoid some conditional statements, use tiny constant arrays */ static final int DirPropFlagLR[] = { DirPropFlag(L), DirPropFlag(R) }; static final int DirPropFlagE[] = { DirPropFlag(LRE), DirPropFlag(RLE) }; static final int DirPropFlagO[] = { DirPropFlag(LRO), DirPropFlag(RLO) }; static final int DirPropFlagLR(byte level) { return DirPropFlagLR[level & 1]; } static final int DirPropFlagE(byte level) { return DirPropFlagE[level & 1]; } static final int DirPropFlagO(byte level) { return DirPropFlagO[level & 1]; } static final byte DirFromStrong(byte strong) { return strong == L ? L : R; } static final byte NoOverride(byte level) { return (byte)(level & ~LEVEL_OVERRIDE); } /* are there any characters that are LTR or RTL? */ static final int MASK_LTR = DirPropFlag(L)|DirPropFlag(EN)|DirPropFlag(ENL)|DirPropFlag(ENR)|DirPropFlag(AN)|DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(LRI); static final int MASK_RTL = DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(RLI); static final int MASK_R_AL = DirPropFlag(R)|DirPropFlag(AL); static final int MASK_STRONG_EN_AN = DirPropFlag(L)|DirPropFlag(R)|DirPropFlag(AL)|DirPropFlag(EN)|DirPropFlag(AN); /* explicit embedding codes */ static final int MASK_EXPLICIT = DirPropFlag(LRE)|DirPropFlag(LRO)|DirPropFlag(RLE)|DirPropFlag(RLO)|DirPropFlag(PDF); static final int MASK_BN_EXPLICIT = DirPropFlag(BN)|MASK_EXPLICIT; /* explicit isolate codes */ static final int MASK_ISO = DirPropFlag(LRI)|DirPropFlag(RLI)|DirPropFlag(FSI)|DirPropFlag(PDI); /* paragraph and segment separators */ static final int MASK_B_S = DirPropFlag(B)|DirPropFlag(S); /* all types that are counted as White Space or Neutral in some steps */ static final int MASK_WS = MASK_B_S|DirPropFlag(WS)|MASK_BN_EXPLICIT|MASK_ISO; /* types that are neutrals or could becomes neutrals in (Wn) */ static final int MASK_POSSIBLE_N = DirPropFlag(ON)|DirPropFlag(CS)|DirPropFlag(ES)|DirPropFlag(ET)|MASK_WS; /* * These types may be changed to "e", * the embedding type (L or R) of the run, * in the Bidi algorithm (N2) */ static final int MASK_EMBEDDING = DirPropFlag(NSM)|MASK_POSSIBLE_N; /* * the dirProp's L and R are defined to 0 and 1 values in UCharacterDirection.java */ static byte GetLRFromLevel(byte level) { return (byte)(level & 1); } static boolean IsDefaultLevel(byte level) { return ((level & LEVEL_DEFAULT_LTR) == LEVEL_DEFAULT_LTR); } static boolean IsBidiControlChar(int c) { /* check for range 0x200c to 0x200f (ZWNJ, ZWJ, LRM, RLM) or 0x202a to 0x202e (LRE, RLE, PDF, LRO, RLO) */ return (((c & 0xfffffffc) == 0x200c) || ((c >= 0x202a) && (c <= 0x202e)) || ((c >= 0x2066) && (c <= 0x2069))); } void verifyValidPara() { if (!(this == this.paraBidi)) { throw new IllegalStateException(); } } void verifyValidParaOrLine() { Bidi para = this.paraBidi; /* verify Para */ if (this == para) { return; } /* verify Line */ if ((para == null) || (para != para.paraBidi)) { throw new IllegalStateException(); } } void verifyRange(int index, int start, int limit) { if (index < start || index >= limit) { throw new IllegalArgumentException("Value " + index + " is out of range " + start + " to " + limit); } } /** * Allocate a Bidi object. * Such an object is initially empty. It is assigned * the Bidi properties of a piece of text containing one or more paragraphs * by setPara() * or the Bidi properties of a line within a paragraph by * setLine().

* This object can be reused.

* setPara() and setLine() will allocate * additional memory for internal structures as necessary. * * @stable ICU 3.8 */ public Bidi() { this(0, 0); } /** * Allocate a Bidi object with preallocated memory * for internal structures. * This method provides a Bidi object like the default constructor * but it also preallocates memory for internal structures * according to the sizings supplied by the caller.

* The preallocation can be limited to some of the internal memory * by setting some values to 0 here. That means that if, e.g., * maxRunCount cannot be reasonably predetermined and should not * be set to maxLength (the only failproof value) to avoid * wasting memory, then maxRunCount could be set to 0 here * and the internal structures that are associated with it will be allocated * on demand, just like with the default constructor. * * @param maxLength is the maximum text or line length that internal memory * will be preallocated for. An attempt to associate this object with a * longer text will fail, unless this value is 0, which leaves the allocation * up to the implementation. * * @param maxRunCount is the maximum anticipated number of same-level runs * that internal memory will be preallocated for. An attempt to access * visual runs on an object that was not preallocated for as many runs * as the text was actually resolved to will fail, * unless this value is 0, which leaves the allocation up to the implementation.

* The number of runs depends on the actual text and maybe anywhere between * 1 and maxLength. It is typically small. * * @throws IllegalArgumentException if maxLength or maxRunCount is less than 0 * @stable ICU 3.8 */ public Bidi(int maxLength, int maxRunCount) { /* check the argument values */ if (maxLength < 0 || maxRunCount < 0) { throw new IllegalArgumentException(); } /* reset the object, all reference variables null, all flags false, all sizes 0. In fact, we don't need to do anything, since class members are initialized as zero when an instance is created. */ /* mayAllocateText = false; mayAllocateRuns = false; orderParagraphsLTR = false; paraCount = 0; runCount = 0; trailingWSStart = 0; flags = 0; paraLevel = 0; defaultParaLevel = 0; direction = 0; */ /* get Bidi properties */ bdp = UBiDiProps.INSTANCE; /* allocate memory for arrays as requested */ if (maxLength > 0) { getInitialDirPropsMemory(maxLength); getInitialLevelsMemory(maxLength); } else { mayAllocateText = true; } if (maxRunCount > 0) { // if maxRunCount == 1, use simpleRuns[] if (maxRunCount > 1) { getInitialRunsMemory(maxRunCount); } } else { mayAllocateRuns = true; } } /* * We are allowed to allocate memory if object==null or * mayAllocate==true for each array that we need. * * Assume sizeNeeded>0. * If object != null, then assume size > 0. */ private Object getMemory(String label, Object array, Class arrayClass, boolean mayAllocate, int sizeNeeded) { int len = Array.getLength(array); /* we have at least enough memory and must not allocate */ if (sizeNeeded == len) { return array; } if (!mayAllocate) { /* we must not allocate */ if (sizeNeeded <= len) { return array; } throw new OutOfMemoryError("Failed to allocate memory for " + label); } /* we may try to grow or shrink */ /* FOOD FOR THOUGHT: when shrinking it should be possible to avoid the allocation altogether and rely on this.length */ try { return Array.newInstance(arrayClass, sizeNeeded); } catch (Exception e) { throw new OutOfMemoryError("Failed to allocate memory for " + label); } } /* helper methods for each allocated array */ private void getDirPropsMemory(boolean mayAllocate, int len) { Object array = getMemory("DirProps", dirPropsMemory, Byte.TYPE, mayAllocate, len); dirPropsMemory = (byte[]) array; } void getDirPropsMemory(int len) { getDirPropsMemory(mayAllocateText, len); } private void getLevelsMemory(boolean mayAllocate, int len) { Object array = getMemory("Levels", levelsMemory, Byte.TYPE, mayAllocate, len); levelsMemory = (byte[]) array; } void getLevelsMemory(int len) { getLevelsMemory(mayAllocateText, len); } private void getRunsMemory(boolean mayAllocate, int len) { Object array = getMemory("Runs", runsMemory, BidiRun.class, mayAllocate, len); runsMemory = (BidiRun[]) array; } void getRunsMemory(int len) { getRunsMemory(mayAllocateRuns, len); } /* additional methods used by constructor - always allow allocation */ private void getInitialDirPropsMemory(int len) { getDirPropsMemory(true, len); } private void getInitialLevelsMemory(int len) { getLevelsMemory(true, len); } private void getInitialRunsMemory(int len) { getRunsMemory(true, len); } /** * Modify the operation of the Bidi algorithm such that it * approximates an "inverse Bidi" algorithm. This method * must be called before setPara(). * *

The normal operation of the Bidi algorithm as described * in the Unicode Technical Report is to take text stored in logical * (keyboard, typing) order and to determine the reordering of it for visual * rendering. * Some legacy systems store text in visual order, and for operations * with standard, Unicode-based algorithms, the text needs to be transformed * to logical order. This is effectively the inverse algorithm of the * described Bidi algorithm. Note that there is no standard algorithm for * this "inverse Bidi" and that the current implementation provides only an * approximation of "inverse Bidi". * *

With isInversed set to true, * this method changes the behavior of some of the subsequent methods * in a way that they can be used for the inverse Bidi algorithm. * Specifically, runs of text with numeric characters will be treated in a * special way and may need to be surrounded with LRM characters when they are * written in reordered sequence. * *

Output runs should be retrieved using getVisualRun(). * Since the actual input for "inverse Bidi" is visually ordered text and * getVisualRun() gets the reordered runs, these are actually * the runs of the logically ordered output. * *

Calling this method with argument isInverse set to * true is equivalent to calling setReorderingMode * with argument reorderingMode * set to REORDER_INVERSE_NUMBERS_AS_L.
* Calling this method with argument isInverse set to * false is equivalent to calling setReorderingMode * with argument reorderingMode * set to REORDER_DEFAULT. * * @param isInverse specifies "forward" or "inverse" Bidi operation. * * @see #setPara * @see #writeReordered * @see #setReorderingMode * @see #REORDER_INVERSE_NUMBERS_AS_L * @see #REORDER_DEFAULT * @stable ICU 3.8 */ public void setInverse(boolean isInverse) { this.isInverse = (isInverse); this.reorderingMode = isInverse ? REORDER_INVERSE_NUMBERS_AS_L : REORDER_DEFAULT; } /** * Is this Bidi object set to perform the inverse Bidi * algorithm? *

Note: calling this method after setting the reordering mode with * setReorderingMode will return true if the * reordering mode was set to * REORDER_INVERSE_NUMBERS_AS_L, false * for all other values. * * @return true if the Bidi object is set to * perform the inverse Bidi algorithm by handling numbers as L. * * @see #setInverse * @see #setReorderingMode * @see #REORDER_INVERSE_NUMBERS_AS_L * @stable ICU 3.8 */ public boolean isInverse() { return isInverse; } /** * Modify the operation of the Bidi algorithm such that it implements some * variant to the basic Bidi algorithm or approximates an "inverse Bidi" * algorithm, depending on different values of the "reordering mode". * This method must be called before setPara(), and stays in * effect until called again with a different argument. * *

The normal operation of the Bidi algorithm as described in the Unicode * Standard Annex #9 is to take text stored in logical (keyboard, typing) * order and to determine how to reorder it for visual rendering. * *

With the reordering mode set to a value other than * REORDER_DEFAULT, this method changes the behavior of some of * the subsequent methods in a way such that they implement an inverse Bidi * algorithm or some other algorithm variants. * *

Some legacy systems store text in visual order, and for operations * with standard, Unicode-based algorithms, the text needs to be transformed * into logical order. This is effectively the inverse algorithm of the * described Bidi algorithm. Note that there is no standard algorithm for * this "inverse Bidi", so a number of variants are implemented here. * *

In other cases, it may be desirable to emulate some variant of the * Logical to Visual algorithm (e.g. one used in MS Windows), or perform a * Logical to Logical transformation. * *

    *
  • When the Reordering Mode is set to * REORDER_DEFAULT, * the standard Bidi Logical to Visual algorithm is applied.
  • * *
  • When the reordering mode is set to * REORDER_NUMBERS_SPECIAL, * the algorithm used to perform Bidi transformations when calling * setPara should approximate the algorithm used in Microsoft * Windows XP rather than strictly conform to the Unicode Bidi algorithm. *
    * The differences between the basic algorithm and the algorithm addressed * by this option are as follows: *
      *
    • Within text at an even embedding level, the sequence "123AB" * (where AB represent R or AL letters) is transformed to "123BA" by the * Unicode algorithm and to "BA123" by the Windows algorithm.
    • * *
    • Arabic-Indic numbers (AN) are handled by the Windows algorithm just * like regular numbers (EN).
    • *
  • * *
  • When the reordering mode is set to * REORDER_GROUP_NUMBERS_WITH_R, * numbers located between LTR text and RTL text are associated with the RTL * text. For instance, an LTR paragraph with content "abc 123 DEF" (where * upper case letters represent RTL characters) will be transformed to * "abc FED 123" (and not "abc 123 FED"), "DEF 123 abc" will be transformed * to "123 FED abc" and "123 FED abc" will be transformed to "DEF 123 abc". * This makes the algorithm reversible and makes it useful when round trip * (from visual to logical and back to visual) must be achieved without * adding LRM characters. However, this is a variation from the standard * Unicode Bidi algorithm.
    * The source text should not contain Bidi control characters other than LRM * or RLM.
  • * *
  • When the reordering mode is set to * REORDER_RUNS_ONLY, * a "Logical to Logical" transformation must be performed: *
      *
    • If the default text level of the source text (argument * paraLevel in setPara) is even, the source text * will be handled as LTR logical text and will be transformed to the RTL * logical text which has the same LTR visual display.
    • *
    • If the default level of the source text is odd, the source text * will be handled as RTL logical text and will be transformed to the * LTR logical text which has the same LTR visual display.
    • *
    * This mode may be needed when logical text which is basically Arabic or * Hebrew, with possible included numbers or phrases in English, has to be * displayed as if it had an even embedding level (this can happen if the * displaying application treats all text as if it was basically LTR). *
    * This mode may also be needed in the reverse case, when logical text which * is basically English, with possible included phrases in Arabic or Hebrew, * has to be displayed as if it had an odd embedding level. *
    * Both cases could be handled by adding LRE or RLE at the head of the * text, if the display subsystem supports these formatting controls. If it * does not, the problem may be handled by transforming the source text in * this mode before displaying it, so that it will be displayed properly. *
    * The source text should not contain Bidi control characters other than LRM * or RLM.
  • * *
  • When the reordering mode is set to * REORDER_INVERSE_NUMBERS_AS_L, an "inverse Bidi" * algorithm is applied. * Runs of text with numeric characters will be treated like LTR letters and * may need to be surrounded with LRM characters when they are written in * reordered sequence (the option INSERT_LRM_FOR_NUMERIC can * be used with method writeReordered to this end. This mode * is equivalent to calling setInverse() with * argument isInverse set to true.
  • * *
  • When the reordering mode is set to * REORDER_INVERSE_LIKE_DIRECT, the "direct" Logical to * Visual Bidi algorithm is used as an approximation of an "inverse Bidi" * algorithm. This mode is similar to mode * REORDER_INVERSE_NUMBERS_AS_L but is closer to the * regular Bidi algorithm. *
    * For example, an LTR paragraph with the content "FED 123 456 CBA" (where * upper case represents RTL characters) will be transformed to * "ABC 456 123 DEF", as opposed to "DEF 123 456 ABC" * with mode REORDER_INVERSE_NUMBERS_AS_L.
    * When used in conjunction with option * OPTION_INSERT_MARKS, this mode generally * adds Bidi marks to the output significantly more sparingly than mode * REORDER_INVERSE_NUMBERS_AS_L.
    with option * INSERT_LRM_FOR_NUMERIC in calls to * writeReordered.
  • * *
  • When the reordering mode is set to * REORDER_INVERSE_FOR_NUMBERS_SPECIAL, the Logical to Visual * Bidi algorithm used in Windows XP is used as an approximation of an "inverse * Bidi" algorithm. *
    * For example, an LTR paragraph with the content "abc FED123" (where * upper case represents RTL characters) will be transformed to * "abc 123DEF.
  • *
* *

In all the reordering modes specifying an "inverse Bidi" algorithm * (i.e. those with a name starting with REORDER_INVERSE), * output runs should be retrieved using getVisualRun(), and * the output text with writeReordered(). The caller should * keep in mind that in "inverse Bidi" modes the input is actually visually * ordered text and reordered output returned by getVisualRun() * or writeReordered() are actually runs or character string * of logically ordered output.
* For all the "inverse Bidi" modes, the source text should not contain * Bidi control characters other than LRM or RLM. * *

Note that option OUTPUT_REVERSE of * writeReordered has no useful meaning and should not be used * in conjunction with any value of the reordering mode specifying "inverse * Bidi" or with value REORDER_RUNS_ONLY. * * @param reorderingMode specifies the required variant of the Bidi * algorithm. * * @see #setInverse * @see #setPara * @see #writeReordered * @see #INSERT_LRM_FOR_NUMERIC * @see #OUTPUT_REVERSE * @see #REORDER_DEFAULT * @see #REORDER_NUMBERS_SPECIAL * @see #REORDER_GROUP_NUMBERS_WITH_R * @see #REORDER_RUNS_ONLY * @see #REORDER_INVERSE_NUMBERS_AS_L * @see #REORDER_INVERSE_LIKE_DIRECT * @see #REORDER_INVERSE_FOR_NUMBERS_SPECIAL * @stable ICU 3.8 */ public void setReorderingMode(int reorderingMode) { if ((reorderingMode < REORDER_DEFAULT) || (reorderingMode >= REORDER_COUNT)) return; /* don't accept a wrong value */ this.reorderingMode = reorderingMode; this.isInverse = reorderingMode == REORDER_INVERSE_NUMBERS_AS_L; } /** * What is the requested reordering mode for a given Bidi object? * * @return the current reordering mode of the Bidi object * * @see #setReorderingMode * @stable ICU 3.8 */ public int getReorderingMode() { return this.reorderingMode; } /** * Specify which of the reordering options should be applied during Bidi * transformations. * * @param options A combination of zero or more of the following * reordering options: * OPTION_DEFAULT, OPTION_INSERT_MARKS, * OPTION_REMOVE_CONTROLS, OPTION_STREAMING. * * @see #getReorderingOptions * @see #OPTION_DEFAULT * @see #OPTION_INSERT_MARKS * @see #OPTION_REMOVE_CONTROLS * @see #OPTION_STREAMING * @stable ICU 3.8 */ public void setReorderingOptions(int options) { if ((options & OPTION_REMOVE_CONTROLS) != 0) { this.reorderingOptions = options & ~OPTION_INSERT_MARKS; } else { this.reorderingOptions = options; } } /** * What are the reordering options applied to a given Bidi object? * * @return the current reordering options of the Bidi object * * @see #setReorderingOptions * @stable ICU 3.8 */ public int getReorderingOptions() { return this.reorderingOptions; } /** * Get the base direction of the text provided according to the Unicode * Bidirectional Algorithm. The base direction is derived from the first * character in the string with bidirectional character type L, R, or AL. * If the first such character has type L, LTR is returned. If the first * such character has type R or AL, RTL is returned. If the string does * not contain any character of these types, then NEUTRAL is returned. * This is a lightweight function for use when only the base direction is * needed and no further bidi processing of the text is needed. * @param paragraph the text whose paragraph level direction is needed. * @return LTR, RTL, NEUTRAL * @see #LTR * @see #RTL * @see #NEUTRAL * @stable ICU 4.6 */ public static byte getBaseDirection(CharSequence paragraph) { if (paragraph == null || paragraph.length() == 0) { return NEUTRAL; } int length = paragraph.length(); int c;// codepoint byte direction; for (int i = 0; i < length; ) { // U16_NEXT(paragraph, i, length, c) for C++ c = UCharacter.codePointAt(paragraph, i); direction = UCharacter.getDirectionality(c); if (direction == UCharacterDirection.LEFT_TO_RIGHT) { return LTR; } else if (direction == UCharacterDirection.RIGHT_TO_LEFT || direction == UCharacterDirection.RIGHT_TO_LEFT_ARABIC) { return RTL; } i = UCharacter.offsetByCodePoints(paragraph, i, 1);// set i to the head index of next codepoint } return NEUTRAL; } /* perform (P2)..(P3) ------------------------------------------------------- */ /** * Returns the directionality of the first strong character * after the last B in prologue, if any. * Requires prologue!=null. */ private byte firstL_R_AL() { byte result = ON; for (int i = 0; i < prologue.length(); ) { int uchar = prologue.codePointAt(i); i += Character.charCount(uchar); byte dirProp = (byte)getCustomizedClass(uchar); if (result == ON) { if (dirProp == L || dirProp == R || dirProp == AL) { result = dirProp; } } else { if (dirProp == B) { result = ON; } } } return result; } /* * Check that there are enough entries in the arrays paras_limit and paras_level */ private void checkParaCount() { int[] saveLimits; byte[] saveLevels; int count = paraCount; if (count <= paras_level.length) return; int oldLength = paras_level.length; saveLimits = paras_limit; saveLevels = paras_level; try { paras_limit = new int[count * 2]; paras_level = new byte[count * 2]; } catch (Exception e) { throw new OutOfMemoryError("Failed to allocate memory for paras"); } System.arraycopy(saveLimits, 0, paras_limit, 0, oldLength); System.arraycopy(saveLevels, 0, paras_level, 0, oldLength); } /* * Get the directional properties for the text, calculate the flags bit-set, and * determine the paragraph level if necessary (in paras_level[i]). * FSI initiators are also resolved and their dirProp replaced with LRI or RLI. * When encountering an FSI, it is initially replaced with an LRI, which is the * default. Only if a strong R or AL is found within its scope will the LRI be * replaced by an RLI. */ static final int NOT_SEEKING_STRONG = 0; /* 0: not contextual paraLevel, not after FSI */ static final int SEEKING_STRONG_FOR_PARA = 1; /* 1: looking for first strong char in para */ static final int SEEKING_STRONG_FOR_FSI = 2; /* 2: looking for first strong after FSI */ static final int LOOKING_FOR_PDI = 3; /* 3: found strong after FSI, looking for PDI */ private void getDirProps() { int i = 0, i0, i1; flags = 0; /* collect all directionalities in the text */ int uchar; byte dirProp; byte defaultParaLevel = 0; /* initialize to avoid compiler warnings */ boolean isDefaultLevel = IsDefaultLevel(paraLevel); /* for inverse Bidi, the default para level is set to RTL if there is a strong R or AL character at either end of the text */ boolean isDefaultLevelInverse=isDefaultLevel && (reorderingMode == REORDER_INVERSE_LIKE_DIRECT || reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL); lastArabicPos = -1; int controlCount = 0; boolean removeBidiControls = (reorderingOptions & OPTION_REMOVE_CONTROLS) != 0; byte state; byte lastStrong = ON; /* for default level & inverse Bidi */ /* The following stacks are used to manage isolate sequences. Those sequences may be nested, but obviously never more deeply than the maximum explicit embedding level. lastStack is the index of the last used entry in the stack. A value of -1 means that there is no open isolate sequence. lastStack is reset to -1 on paragraph boundaries. */ /* The following stack contains the position of the initiator of each open isolate sequence */ int[] isolateStartStack= new int[MAX_EXPLICIT_LEVEL+1]; /* The following stack contains the last known state before encountering the initiator of an isolate sequence */ byte[] previousStateStack = new byte[MAX_EXPLICIT_LEVEL+1]; int stackLast=-1; if ((reorderingOptions & OPTION_STREAMING) != 0) length = 0; defaultParaLevel = (byte)(paraLevel & 1); if (isDefaultLevel) { paras_level[0] = defaultParaLevel; lastStrong = defaultParaLevel; if (prologue != null && /* there is a prologue */ (dirProp = firstL_R_AL()) != ON) { /* with a strong character */ if (dirProp == L) paras_level[0] = 0; /* set the default para level */ else paras_level[0] = 1; /* set the default para level */ state = NOT_SEEKING_STRONG; } else { state = SEEKING_STRONG_FOR_PARA; } } else { paras_level[0] = paraLevel; state = NOT_SEEKING_STRONG; } /* count paragraphs and determine the paragraph level (P2..P3) */ /* * see comment on constant fields: * the LEVEL_DEFAULT_XXX values are designed so that * their low-order bit alone yields the intended default */ for (i = 0; i < originalLength; /* i is incremented in the loop */) { i0 = i; /* index of first code unit */ uchar = UTF16.charAt(text, 0, originalLength, i); i += UTF16.getCharCount(uchar); i1 = i - 1; /* index of last code unit, gets the directional property */ dirProp = (byte)getCustomizedClass(uchar); flags |= DirPropFlag(dirProp); dirProps[i1] = dirProp; if (i1 > i0) { /* set previous code units' properties to BN */ flags |= DirPropFlag(BN); do { dirProps[--i1] = BN; } while (i1 > i0); } if (removeBidiControls && IsBidiControlChar(uchar)) { controlCount++; } if (dirProp == L) { if (state == SEEKING_STRONG_FOR_PARA) { paras_level[paraCount - 1] = 0; state = NOT_SEEKING_STRONG; } else if (state == SEEKING_STRONG_FOR_FSI) { if (stackLast <= MAX_EXPLICIT_LEVEL) { /* no need for next statement, already set by default */ /* dirProps[isolateStartStack[stackLast]] = LRI; */ flags |= DirPropFlag(LRI); } state = LOOKING_FOR_PDI; } lastStrong = L; continue; } if (dirProp == R || dirProp == AL) { if (state == SEEKING_STRONG_FOR_PARA) { paras_level[paraCount - 1] = 1; state = NOT_SEEKING_STRONG; } else if (state == SEEKING_STRONG_FOR_FSI) { if (stackLast <= MAX_EXPLICIT_LEVEL) { dirProps[isolateStartStack[stackLast]] = RLI; flags |= DirPropFlag(RLI); } state = LOOKING_FOR_PDI; } lastStrong = R; if (dirProp == AL) lastArabicPos = i - 1; continue; } if (dirProp >= FSI && dirProp <= RLI) { /* FSI, LRI or RLI */ stackLast++; if (stackLast <= MAX_EXPLICIT_LEVEL) { isolateStartStack[stackLast] = i - 1; previousStateStack[stackLast] = state; } if (dirProp == FSI) { dirProps[i-1] = LRI; /* default if no strong char */ state = SEEKING_STRONG_FOR_FSI; } else state = LOOKING_FOR_PDI; continue; } if (dirProp == PDI) { if (state == SEEKING_STRONG_FOR_FSI) { if (stackLast <= MAX_EXPLICIT_LEVEL) { /* no need for next statement, already set by default */ /* dirProps[isolateStartStack[stackLast]] = LRI; */ flags |= DirPropFlag(LRI); } } if (stackLast >= 0) { if (stackLast <= MAX_EXPLICIT_LEVEL) state = previousStateStack[stackLast]; stackLast--; } continue; } if (dirProp == B) { if (i < originalLength && uchar == CR && text[i] == LF) /* do nothing on the CR */ continue; paras_limit[paraCount - 1] = i; if (isDefaultLevelInverse && lastStrong == R) paras_level[paraCount - 1] = 1; if ((reorderingOptions & OPTION_STREAMING) != 0) { /* When streaming, we only process whole paragraphs thus some updates are only done on paragraph boundaries */ length = i; /* i is index to next character */ this.controlCount = controlCount; } if (i < originalLength) { /* B not last char in text */ paraCount++; checkParaCount(); /* check that there is enough memory for a new para entry */ if (isDefaultLevel) { paras_level[paraCount - 1] = defaultParaLevel; state = SEEKING_STRONG_FOR_PARA; lastStrong = defaultParaLevel; } else { paras_level[paraCount - 1] = paraLevel; state = NOT_SEEKING_STRONG; } stackLast = -1; } continue; } } /* +Ignore still open isolate sequences with overflow */ if (stackLast > MAX_EXPLICIT_LEVEL) { stackLast = MAX_EXPLICIT_LEVEL; state=SEEKING_STRONG_FOR_FSI; /* to be on the safe side */ } /* Resolve direction of still unresolved open FSI sequences */ while (stackLast >= 0) { if (state == SEEKING_STRONG_FOR_FSI) { /* no need for next statement, already set by default */ /* dirProps[isolateStartStack[stackLast]] = LRI; */ flags |= DirPropFlag(LRI); break; } state = previousStateStack[stackLast]; stackLast--; } /* When streaming, ignore text after the last paragraph separator */ if ((reorderingOptions & OPTION_STREAMING) != 0) { if (length < originalLength) paraCount--; } else { paras_limit[paraCount - 1] = originalLength; this.controlCount = controlCount; } /* For inverse bidi, default para direction is RTL if there is a strong R or AL at either end of the paragraph */ if (isDefaultLevelInverse && lastStrong == R) { paras_level[paraCount - 1] = 1; } if (isDefaultLevel) { paraLevel = paras_level[0]; } /* The following is needed to resolve the text direction for default level paragraphs containing no strong character */ for (i = 0; i < paraCount; i++) flags |= DirPropFlagLR(paras_level[i]); if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) { flags |= DirPropFlag(L); } } /* determine the paragraph level at position index */ byte GetParaLevelAt(int pindex) { if (defaultParaLevel == 0 || pindex < paras_limit[0]) return paraLevel; int i; for (i = 1; i < paraCount; i++) if (pindex < paras_limit[i]) break; if (i >= paraCount) i = paraCount - 1; return paras_level[i]; } /* Functions for handling paired brackets ----------------------------------- */ /* In the isoRuns array, the first entry is used for text outside of any isolate sequence. Higher entries are used for each more deeply nested isolate sequence. isoRunLast is the index of the last used entry. The openings array is used to note the data of opening brackets not yet matched by a closing bracket, or matched but still susceptible to change level. Each isoRun entry contains the index of the first and one-after-last openings entries for pending opening brackets it contains. The next openings entry to use is the one-after-last of the most deeply nested isoRun entry. isoRun entries also contain their current embedding level and the last encountered strong character, since these will be needed to resolve the level of paired brackets. */ private void bracketInit(BracketData bd) { bd.isoRunLast = 0; bd.isoRuns[0] = new IsoRun(); bd.isoRuns[0].start = 0; bd.isoRuns[0].limit = 0; bd.isoRuns[0].level = GetParaLevelAt(0); bd.isoRuns[0].lastStrong = bd.isoRuns[0].lastBase = bd.isoRuns[0].contextDir = (byte)(GetParaLevelAt(0) & 1); bd.isoRuns[0].contextPos = 0; bd.openings = new Opening[SIMPLE_OPENINGS_COUNT]; bd.isNumbersSpecial = reorderingMode == REORDER_NUMBERS_SPECIAL || reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL; } /* paragraph boundary */ private void bracketProcessB(BracketData bd, byte level) { bd.isoRunLast = 0; bd.isoRuns[0].limit = 0; bd.isoRuns[0].level = level; bd.isoRuns[0].lastStrong = bd.isoRuns[0].lastBase = bd.isoRuns[0].contextDir = (byte)(level & 1); bd.isoRuns[0].contextPos = 0; } /* LRE, LRO, RLE, RLO, PDF */ private void bracketProcessBoundary(BracketData bd, int lastCcPos, byte contextLevel, byte embeddingLevel) { IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; if ((DirPropFlag(dirProps[lastCcPos]) & MASK_ISO) != 0) /* after an isolate */ return; if (NoOverride(embeddingLevel) > NoOverride(contextLevel)) /* not a PDF */ contextLevel = embeddingLevel; pLastIsoRun.limit = pLastIsoRun.start; pLastIsoRun.level = embeddingLevel; pLastIsoRun.lastStrong = pLastIsoRun.lastBase = pLastIsoRun.contextDir = (byte)(contextLevel & 1); pLastIsoRun.contextPos = lastCcPos; } /* LRI or RLI */ private void bracketProcessLRI_RLI(BracketData bd, byte level) { IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; short lastLimit; pLastIsoRun.lastBase = ON; lastLimit = pLastIsoRun.limit; bd.isoRunLast++; pLastIsoRun = bd.isoRuns[bd.isoRunLast]; if (pLastIsoRun == null) pLastIsoRun = bd.isoRuns[bd.isoRunLast] = new IsoRun(); pLastIsoRun.start = pLastIsoRun.limit = lastLimit; pLastIsoRun.level = level; pLastIsoRun.lastStrong = pLastIsoRun.lastBase = pLastIsoRun.contextDir = (byte)(level & 1); pLastIsoRun.contextPos = 0; } /* PDI */ private void bracketProcessPDI(BracketData bd) { IsoRun pLastIsoRun; bd.isoRunLast--; pLastIsoRun = bd.isoRuns[bd.isoRunLast]; pLastIsoRun.lastBase = ON; } /* newly found opening bracket: create an openings entry */ private void bracketAddOpening(BracketData bd, char match, int position) { IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; Opening pOpening; if (pLastIsoRun.limit >= bd.openings.length) { /* no available new entry */ Opening[] saveOpenings = bd.openings; int count; try { count = bd.openings.length; bd.openings = new Opening[count * 2]; } catch (Exception e) { throw new OutOfMemoryError("Failed to allocate memory for openings"); } System.arraycopy(saveOpenings, 0, bd.openings, 0, count); } pOpening = bd.openings[pLastIsoRun.limit]; if (pOpening == null) pOpening = bd.openings[pLastIsoRun.limit]= new Opening(); pOpening.position = position; pOpening.match = match; pOpening.contextDir = pLastIsoRun.contextDir; pOpening.contextPos = pLastIsoRun.contextPos; pOpening.flags = 0; pLastIsoRun.limit++; } /* change N0c1 to N0c2 when a preceding bracket is assigned the embedding level */ private void fixN0c(BracketData bd, int openingIndex, int newPropPosition, byte newProp) { /* This function calls itself recursively */ IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; Opening qOpening; int k, openingPosition, closingPosition; for (k = openingIndex+1; k < pLastIsoRun.limit; k++) { qOpening = bd.openings[k]; if (qOpening.match >= 0) /* not an N0c match */ continue; if (newPropPosition < qOpening.contextPos) break; if (newPropPosition >= qOpening.position) continue; if (newProp == qOpening.contextDir) break; openingPosition = qOpening.position; dirProps[openingPosition] = newProp; closingPosition = -(qOpening.match); dirProps[closingPosition] = newProp; qOpening.match = 0; /* prevent further changes */ fixN0c(bd, k, openingPosition, newProp); fixN0c(bd, k, closingPosition, newProp); } } /* process closing bracket; return L or R if N0b or N0c, ON if N0d */ private byte bracketProcessClosing(BracketData bd, int openIdx, int position) { IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; Opening pOpening, qOpening; byte direction; boolean stable; byte newProp; pOpening = bd.openings[openIdx]; direction = (byte)(pLastIsoRun.level & 1); stable = true; /* assume stable until proved otherwise */ /* The stable flag is set when brackets are paired and their level is resolved and cannot be changed by what will be found later in the source string. An unstable match can occur only when applying N0c, where the resolved level depends on the preceding context, and this context may be affected by text occurring later. Example: RTL paragraph containing: abc[(latin) HEBREW] When the closing parenthesis is encountered, it appears that N0c1 must be applied since 'abc' sets an opposite direction context and both parentheses receive level 2. However, when the closing square bracket is processed, N0b applies because of 'HEBREW' being included within the brackets, thus the square brackets are treated like R and receive level 1. However, this changes the preceding context of the opening parenthesis, and it now appears that N0c2 must be applied to the parentheses rather than N0c1. */ if ((direction == 0 && (pOpening.flags & FOUND_L) > 0) || (direction == 1 && (pOpening.flags & FOUND_R) > 0)) { /* N0b */ newProp = direction; } else if ((pOpening.flags & (FOUND_L | FOUND_R)) != 0) { /* N0c */ /* it is stable if there is no preceding text or in conditions too complicated and not worth checking */ stable = (openIdx == pLastIsoRun.start); if (direction != pOpening.contextDir) newProp = pOpening.contextDir; /* N0c1 */ else newProp = direction; /* N0c2 */ } else { /* forget this and any brackets nested within this pair */ pLastIsoRun.limit = (short)openIdx; return ON; /* N0d */ } dirProps[pOpening.position] = newProp; dirProps[position] = newProp; /* Update nested N0c pairs that may be affected */ fixN0c(bd, openIdx, pOpening.position, newProp); if (stable) { pLastIsoRun.limit = (short)openIdx; /* forget any brackets nested within this pair */ /* remove lower located synonyms if any */ while (pLastIsoRun.limit > pLastIsoRun.start && bd.openings[pLastIsoRun.limit - 1].position == pOpening.position) pLastIsoRun.limit--; } else { int k; pOpening.match = -position; /* neutralize lower located synonyms if any */ k = openIdx - 1; while (k >= pLastIsoRun.start && bd.openings[k].position == pOpening.position) bd.openings[k--].match = 0; /* neutralize any unmatched opening between the current pair; this will also neutralize higher located synonyms if any */ for (k = openIdx + 1; k < pLastIsoRun.limit; k++) { qOpening =bd.openings[k]; if (qOpening.position >= position) break; if (qOpening.match > 0) qOpening.match = 0; } } return newProp; } /* handle strong characters, digits and candidates for closing brackets */ private void bracketProcessChar(BracketData bd, int position) { IsoRun pLastIsoRun = bd.isoRuns[bd.isoRunLast]; byte dirProp, newProp; byte level; dirProp = dirProps[position]; if (dirProp == ON) { char c, match; int idx; /* First see if it is a matching closing bracket. Hopefully, this is more efficient than checking if it is a closing bracket at all */ c = text[position]; for (idx = pLastIsoRun.limit - 1; idx >= pLastIsoRun.start; idx--) { if (bd.openings[idx].match != c) continue; /* We have a match */ newProp = bracketProcessClosing(bd, idx, position); if(newProp == ON) { /* N0d */ c = 0; /* prevent handling as an opening */ break; } pLastIsoRun.lastBase = ON; pLastIsoRun.contextDir = newProp; pLastIsoRun.contextPos = position; level = levels[position]; if ((level & LEVEL_OVERRIDE) != 0) { /* X4, X5 */ short flag; int i; newProp = (byte)(level & 1); pLastIsoRun.lastStrong = newProp; flag = (short)DirPropFlag(newProp); for (i = pLastIsoRun.start; i < idx; i++) bd.openings[i].flags |= flag; /* matching brackets are not overridden by LRO/RLO */ levels[position] &= ~LEVEL_OVERRIDE; } /* matching brackets are not overridden by LRO/RLO */ levels[bd.openings[idx].position] &= ~LEVEL_OVERRIDE; return; } /* We get here only if the ON character is not a matching closing bracket or it is a case of N0d */ /* Now see if it is an opening bracket */ if (c != 0) match = (char)UCharacter.getBidiPairedBracket(c); /* get the matching char */ else match = 0; if (match != c && /* has a matching char */ UCharacter.getIntPropertyValue(c, UProperty.BIDI_PAIRED_BRACKET_TYPE) == /* opening bracket */ UCharacter.BidiPairedBracketType.OPEN) { /* special case: process synonyms create an opening entry for each synonym */ if (match == 0x232A) { /* RIGHT-POINTING ANGLE BRACKET */ bracketAddOpening(bd, (char)0x3009, position); } else if (match == 0x3009) { /* RIGHT ANGLE BRACKET */ bracketAddOpening(bd, (char)0x232A, position); } bracketAddOpening(bd, match, position); } } level = levels[position]; if ((level & LEVEL_OVERRIDE) != 0) { /* X4, X5 */ newProp = (byte)(level & 1); if (dirProp != S && dirProp != WS && dirProp != ON) dirProps[position] = newProp; pLastIsoRun.lastBase = newProp; pLastIsoRun.lastStrong = newProp; pLastIsoRun.contextDir = newProp; pLastIsoRun.contextPos = position; } else if (dirProp <= R || dirProp == AL) { newProp = DirFromStrong(dirProp); pLastIsoRun.lastBase = dirProp; pLastIsoRun.lastStrong = dirProp; pLastIsoRun.contextDir = newProp; pLastIsoRun.contextPos = position; } else if(dirProp == EN) { pLastIsoRun.lastBase = EN; if (pLastIsoRun.lastStrong == L) { newProp = L; /* W7 */ if (!bd.isNumbersSpecial) dirProps[position] = ENL; pLastIsoRun.contextDir = L; pLastIsoRun.contextPos = position; } else { newProp = R; /* N0 */ if (pLastIsoRun.lastStrong == AL) dirProps[position] = AN; /* W2 */ else dirProps[position] = ENR; pLastIsoRun.contextDir = R; pLastIsoRun.contextPos = position; } } else if (dirProp == AN) { newProp = R; /* N0 */ pLastIsoRun.lastBase = AN; pLastIsoRun.contextDir = R; pLastIsoRun.contextPos = position; } else if (dirProp == NSM) { /* if the last real char was ON, change NSM to ON so that it will stay ON even if the last real char is a bracket which may be changed to L or R */ newProp = pLastIsoRun.lastBase; if (newProp == ON) dirProps[position] = newProp; } else { newProp = dirProp; pLastIsoRun.lastBase = dirProp; } if (newProp <= R || newProp == AL) { int i; short flag = (short)DirPropFlag(DirFromStrong(newProp)); for (i = pLastIsoRun.start; i < pLastIsoRun.limit; i++) if (position > bd.openings[i].position) bd.openings[i].flags |= flag; } } /* perform (X1)..(X9) ------------------------------------------------------- */ /* determine if the text is mixed-directional or single-directional */ private byte directionFromFlags() { /* if the text contains AN and neutrals, then some neutrals may become RTL */ if (!((flags & MASK_RTL) != 0 || ((flags & DirPropFlag(AN)) != 0 && (flags & MASK_POSSIBLE_N) != 0))) { return LTR; } else if ((flags & MASK_LTR) == 0) { return RTL; } else { return MIXED; } } /* * Resolve the explicit levels as specified by explicit embedding codes. * Recalculate the flags to have them reflect the real properties * after taking the explicit embeddings into account. * * The BiDi algorithm is designed to result in the same behavior whether embedding * levels are externally specified (from "styled text", supposedly the preferred * method) or set by explicit embedding codes (LRx, RLx, PDF, FSI, PDI) in the plain text. * That is why (X9) instructs to remove all not-isolate explicit codes (and BN). * However, in a real implementation, the removal of these codes and their index * positions in the plain text is undesirable since it would result in * reallocated, reindexed text. * Instead, this implementation leaves the codes in there and just ignores them * in the subsequent processing. * In order to get the same reordering behavior, positions with a BN or a not-isolate * explicit embedding code just get the same level assigned as the last "real" * character. * * Some implementations, not this one, then overwrite some of these * directionality properties at "real" same-level-run boundaries by * L or R codes so that the resolution of weak types can be performed on the * entire paragraph at once instead of having to parse it once more and * perform that resolution on same-level-runs. * This limits the scope of the implicit rules in effectively * the same way as the run limits. * * Instead, this implementation does not modify these codes, except for * paired brackets whose properties (ON) may be replaced by L or R. * On one hand, the paragraph has to be scanned for same-level-runs, but * on the other hand, this saves another loop to reset these codes, * or saves making and modifying a copy of dirProps[]. * * * Note that (Pn) and (Xn) changed significantly from version 4 of the BiDi algorithm. * * * Handling the stack of explicit levels (Xn): * * With the BiDi stack of explicit levels, as pushed with each * LRE, RLE, LRO, RLO, LRI, RLI and FSI and popped with each PDF and PDI, * the explicit level must never exceed MAX_EXPLICIT_LEVEL. * * In order to have a correct push-pop semantics even in the case of overflows, * overflow counters and a valid isolate counter are used as described in UAX#9 * section 3.3.2 "Explicit Levels and Directions". * * This implementation assumes that MAX_EXPLICIT_LEVEL is odd. * * Returns the direction * */ private byte resolveExplicitLevels() { int i = 0; byte dirProp; byte level = GetParaLevelAt(0); byte dirct; isolateCount = 0; /* determine if the text is mixed-directional or single-directional */ dirct = directionFromFlags(); /* we may not need to resolve any explicit levels */ if (dirct != MIXED) { /* not mixed directionality: levels don't matter - trailingWSStart will be 0 */ return dirct; } if (reorderingMode > REORDER_LAST_LOGICAL_TO_VISUAL) { /* inverse BiDi: mixed, but all characters are at the same embedding level */ /* set all levels to the paragraph level */ int paraIndex, start, limit; for (paraIndex = 0; paraIndex < paraCount; paraIndex++) { if (paraIndex == 0) start = 0; else start = paras_limit[paraIndex - 1]; limit = paras_limit[paraIndex]; level = paras_level[paraIndex]; for (i = start; i < limit; i++) levels[i] =level; } return dirct; /* no bracket matching for inverse BiDi */ } if ((flags & (MASK_EXPLICIT | MASK_ISO)) == 0) { /* no embeddings, set all levels to the paragraph level */ /* we still have to perform bracket matching */ int paraIndex, start, limit; BracketData bracketData = new BracketData(); bracketInit(bracketData); for (paraIndex = 0; paraIndex < paraCount; paraIndex++) { if (paraIndex == 0) start = 0; else start = paras_limit[paraIndex-1]; limit = paras_limit[paraIndex]; level = paras_level[paraIndex]; for (i = start; i < limit; i++) { levels[i] = level; dirProp = dirProps[i]; if (dirProp == BN) continue; if (dirProp == B) { if ((i + 1) < length) { if (text[i] == CR && text[i + 1] == LF) continue; /* skip CR when followed by LF */ bracketProcessB(bracketData, level); } continue; } bracketProcessChar(bracketData, i); } } return dirct; } /* continue to perform (Xn) */ /* (X1) level is set for all codes, embeddingLevel keeps track of the push/pop operations */ /* both variables may carry the LEVEL_OVERRIDE flag to indicate the override status */ byte embeddingLevel = level, newLevel; byte previousLevel = level; /* previous level for regular (not CC) characters */ int lastCcPos = 0; /* index of last effective LRx,RLx, PDx */ /* The following stack remembers the embedding level and the ISOLATE flag of level runs. stackLast points to its current entry. */ short[] stack = new short[MAX_EXPLICIT_LEVEL + 2]; /* we never push anything >= MAX_EXPLICIT_LEVEL but we need one more entry as base */ int stackLast = 0; int overflowIsolateCount = 0; int overflowEmbeddingCount = 0; int validIsolateCount = 0; BracketData bracketData = new BracketData(); bracketInit(bracketData); stack[0] = level; /* initialize base entry to para level, no override, no isolate */ /* recalculate the flags */ flags = 0; for (i = 0; i < length; i++) { dirProp = dirProps[i]; switch (dirProp) { case LRE: case RLE: case LRO: case RLO: /* (X2, X3, X4, X5) */ flags |= DirPropFlag(BN); levels[i] = previousLevel; if (dirProp == LRE || dirProp == LRO) /* least greater even level */ newLevel = (byte)((embeddingLevel+2) & ~(LEVEL_OVERRIDE | 1)); else /* least greater odd level */ newLevel = (byte)((NoOverride(embeddingLevel) + 1) | 1); if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0 && overflowEmbeddingCount == 0) { lastCcPos = i; embeddingLevel = newLevel; if (dirProp == LRO || dirProp == RLO) embeddingLevel |= LEVEL_OVERRIDE; stackLast++; stack[stackLast] = embeddingLevel; /* we don't need to set LEVEL_OVERRIDE off for LRE and RLE since this has already been done for newLevel which is the source for embeddingLevel. */ } else { if (overflowIsolateCount == 0) overflowEmbeddingCount++; } break; case PDF: /* (X7) */ flags |= DirPropFlag(BN); levels[i] = previousLevel; /* handle all the overflow cases first */ if (overflowIsolateCount > 0) { break; } if (overflowEmbeddingCount > 0) { overflowEmbeddingCount--; break; } if (stackLast > 0 && stack[stackLast] < ISOLATE) { /* not an isolate entry */ lastCcPos = i; stackLast--; embeddingLevel = (byte)stack[stackLast]; } break; case LRI: case RLI: flags |= DirPropFlag(ON) | DirPropFlagLR(embeddingLevel); levels[i] = NoOverride(embeddingLevel); if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) { bracketProcessBoundary(bracketData, lastCcPos, previousLevel, embeddingLevel); flags |= DirPropFlagMultiRuns; } previousLevel = embeddingLevel; /* (X5a, X5b) */ if (dirProp == LRI) /* least greater even level */ newLevel=(byte)((embeddingLevel+2)&~(LEVEL_OVERRIDE|1)); else /* least greater odd level */ newLevel=(byte)((NoOverride(embeddingLevel)+1)|1); if (newLevel <= MAX_EXPLICIT_LEVEL && overflowIsolateCount == 0 && overflowEmbeddingCount == 0) { flags |= DirPropFlag(dirProp); lastCcPos = i; validIsolateCount++; if (validIsolateCount > isolateCount) isolateCount = validIsolateCount; embeddingLevel = newLevel; /* we can increment stackLast without checking because newLevel will exceed UBIDI_MAX_EXPLICIT_LEVEL before stackLast overflows */ stackLast++; stack[stackLast] = (short)(embeddingLevel + ISOLATE); bracketProcessLRI_RLI(bracketData, embeddingLevel); } else { /* make it WS so that it is handled by adjustWSLevels() */ dirProps[i] = WS; overflowIsolateCount++; } break; case PDI: if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) { bracketProcessBoundary(bracketData, lastCcPos, previousLevel, embeddingLevel); flags |= DirPropFlagMultiRuns; } /* (X6a) */ if (overflowIsolateCount > 0) { overflowIsolateCount--; /* make it WS so that it is handled by adjustWSLevels() */ dirProps[i] = WS; } else if (validIsolateCount > 0) { flags |= DirPropFlag(PDI); lastCcPos = i; overflowEmbeddingCount = 0; while (stack[stackLast] < ISOLATE) /* pop embedding entries */ stackLast--; /* until the last isolate entry */ stackLast--; /* pop also the last isolate entry */ validIsolateCount--; bracketProcessPDI(bracketData); } else /* make it WS so that it is handled by adjustWSLevels() */ dirProps[i] = WS; embeddingLevel = (byte)(stack[stackLast] & ~ISOLATE); flags |= DirPropFlag(ON) | DirPropFlagLR(embeddingLevel); previousLevel = embeddingLevel; levels[i] = NoOverride(embeddingLevel); break; case B: flags |= DirPropFlag(B); levels[i] = GetParaLevelAt(i); if ((i + 1) < length) { if (text[i] == CR && text[i + 1] == LF) break; /* skip CR when followed by LF */ overflowEmbeddingCount = overflowIsolateCount = 0; validIsolateCount = 0; stackLast = 0; previousLevel = embeddingLevel = GetParaLevelAt(i + 1); stack[0] = embeddingLevel; /* initialize base entry to para level, no override, no isolate */ bracketProcessB(bracketData, embeddingLevel); } break; case BN: /* BN, LRE, RLE, and PDF are supposed to be removed (X9) */ /* they will get their levels set correctly in adjustWSLevels() */ levels[i] = previousLevel; flags |= DirPropFlag(BN); break; default: /* all other types are normal characters and get the "real" level */ if (NoOverride(embeddingLevel) != NoOverride(previousLevel)) { bracketProcessBoundary(bracketData, lastCcPos, previousLevel, embeddingLevel); flags |= DirPropFlagMultiRuns; if ((embeddingLevel & LEVEL_OVERRIDE) != 0) flags |= DirPropFlagO(embeddingLevel); else flags |= DirPropFlagE(embeddingLevel); } previousLevel = embeddingLevel; levels[i] = embeddingLevel; bracketProcessChar(bracketData, i); /* the dirProp may have been changed in bracketProcessChar() */ flags |= DirPropFlag(dirProps[i]); break; } } if ((flags & MASK_EMBEDDING) != 0) { flags |= DirPropFlagLR(paraLevel); } if (orderParagraphsLTR && (flags & DirPropFlag(B)) != 0) { flags |= DirPropFlag(L); } /* again, determine if the text is mixed-directional or single-directional */ dirct = directionFromFlags(); return dirct; } /** * Use a pre-specified embedding levels array: * *

Adjust the directional properties for overrides (->LEVEL_OVERRIDE), * ignore all explicit codes (X9), * and check all the preset levels. * *

Recalculate the flags to have them reflect the real properties * after taking the explicit embeddings into account. */ private byte checkExplicitLevels() { int isolateCount = 0; this.flags = 0; /* collect all directionalities in the text */ this.isolateCount = 0; int currentParaIndex = 0; int currentParaLimit = paras_limit[0]; byte currentParaLevel = paraLevel; for (int i = 0; i < length; ++i) { byte level = levels[i]; byte dirProp = dirProps[i]; if (dirProp == LRI || dirProp == RLI) { isolateCount++; if (isolateCount > this.isolateCount) this.isolateCount = isolateCount; } else if (dirProp == PDI) isolateCount--; else if (dirProp == B) isolateCount = 0; // optimized version of byte currentParaLevel = GetParaLevelAt(i); if (defaultParaLevel != 0 && i == currentParaLimit && (currentParaIndex + 1) < paraCount) { currentParaLevel = paras_level[++currentParaIndex]; currentParaLimit = paras_limit[currentParaIndex]; } int overrideFlag = level & LEVEL_OVERRIDE; level &= ~LEVEL_OVERRIDE; if (level < currentParaLevel || MAX_EXPLICIT_LEVEL < level) { if (level == 0) { if (dirProp == B) { // Paragraph separators are ok with explicit level 0. // Prevents reordering of paragraphs. } else { // Treat explicit level 0 as a wildcard for the paragraph level. // Avoid making the caller guess what the paragraph level would be. level = currentParaLevel; levels[i] = (byte)(level | overrideFlag); } } else { // 1 <= level < currentParaLevel or MAX_EXPLICIT_LEVEL < level throw new IllegalArgumentException("level " + level + " out of bounds at " + i); } } if (overrideFlag != 0) { /* keep the override flag in levels[i] but adjust the flags */ flags |= DirPropFlagO(level); } else { /* set the flags */ flags |= DirPropFlagE(level) | DirPropFlag(dirProp); } } if ((flags & MASK_EMBEDDING) != 0) flags |= DirPropFlagLR(paraLevel); /* determine if the text is mixed-directional or single-directional */ return directionFromFlags(); } /*********************************************************************/ /* The Properties state machine table */ /*********************************************************************/ /* */ /* All table cells are 8 bits: */ /* bits 0..4: next state */ /* bits 5..7: action to perform (if > 0) */ /* */ /* Cells may be of format "n" where n represents the next state */ /* (except for the rightmost column). */ /* Cells may also be of format "_(x,y)" where x represents an action */ /* to perform and y represents the next state. */ /* */ /*********************************************************************/ /* Definitions and type for properties state tables */ /*********************************************************************/ private static final int IMPTABPROPS_COLUMNS = 16; private static final int IMPTABPROPS_RES = IMPTABPROPS_COLUMNS - 1; private static short GetStateProps(short cell) { return (short)(cell & 0x1f); } private static short GetActionProps(short cell) { return (short)(cell >> 5); } private static final short groupProp[] = /* dirProp regrouped */ { /* L R EN ES ET AN CS B S WS ON LRE LRO AL RLE RLO PDF NSM BN FSI LRI RLI PDI ENL ENR */ 0, 1, 2, 7, 8, 3, 9, 6, 5, 4, 4, 10, 10, 12, 10, 10, 10, 11, 10, 4, 4, 4, 4, 13, 14 }; private static final short _L = 0; private static final short _R = 1; private static final short _EN = 2; private static final short _AN = 3; private static final short _ON = 4; private static final short _S = 5; private static final short _B = 6; /* reduced dirProp */ /*********************************************************************/ /* */ /* PROPERTIES STATE TABLE */ /* */ /* In table impTabProps, */ /* - the ON column regroups ON and WS, FSI, RLI, LRI and PDI */ /* - the BN column regroups BN, LRE, RLE, LRO, RLO, PDF */ /* - the Res column is the reduced property assigned to a run */ /* */ /* Action 1: process current run1, init new run1 */ /* 2: init new run2 */ /* 3: process run1, process run2, init new run1 */ /* 4: process run1, set run1=run2, init new run2 */ /* */ /* Notes: */ /* 1) This table is used in resolveImplicitLevels(). */ /* 2) This table triggers actions when there is a change in the Bidi*/ /* property of incoming characters (action 1). */ /* 3) Most such property sequences are processed immediately (in */ /* fact, passed to processPropertySeq(). */ /* 4) However, numbers are assembled as one sequence. This means */ /* that undefined situations (like CS following digits, until */ /* it is known if the next char will be a digit) are held until */ /* following chars define them. */ /* Example: digits followed by CS, then comes another CS or ON; */ /* the digits will be processed, then the CS assigned */ /* as the start of an ON sequence (action 3). */ /* 5) There are cases where more than one sequence must be */ /* processed, for instance digits followed by CS followed by L: */ /* the digits must be processed as one sequence, and the CS */ /* must be processed as an ON sequence, all this before starting */ /* assembling chars for the opening L sequence. */ /* */ /* */ private static final short impTabProps[][] = { /* L, R, EN, AN, ON, S, B, ES, ET, CS, BN, NSM, AL, ENL, ENR, Res */ /* 0 Init */ { 1, 2, 4, 5, 7, 15, 17, 7, 9, 7, 0, 7, 3, 18, 21, _ON }, /* 1 L */ { 1, 32+2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 1, 1, 32+3, 32+18, 32+21, _L }, /* 2 R */ { 32+1, 2, 32+4, 32+5, 32+7, 32+15, 32+17, 32+7, 32+9, 32+7, 2, 2, 32+3, 32+18, 32+21, _R }, /* 3 AL */ { 32+1, 32+2, 32+6, 32+6, 32+8, 32+16, 32+17, 32+8, 32+8, 32+8, 3, 3, 3, 32+18, 32+21, _R }, /* 4 EN */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 64+10, 11, 64+10, 4, 4, 32+3, 18, 21, _EN }, /* 5 AN */ { 32+1, 32+2, 32+4, 5, 32+7, 32+15, 32+17, 32+7, 32+9, 64+12, 5, 5, 32+3, 32+18, 32+21, _AN }, /* 6 AL:EN/AN */ { 32+1, 32+2, 6, 6, 32+8, 32+16, 32+17, 32+8, 32+8, 64+13, 6, 6, 32+3, 18, 21, _AN }, /* 7 ON */ { 32+1, 32+2, 32+4, 32+5, 7, 32+15, 32+17, 7, 64+14, 7, 7, 7, 32+3, 32+18, 32+21, _ON }, /* 8 AL:ON */ { 32+1, 32+2, 32+6, 32+6, 8, 32+16, 32+17, 8, 8, 8, 8, 8, 32+3, 32+18, 32+21, _ON }, /* 9 ET */ { 32+1, 32+2, 4, 32+5, 7, 32+15, 32+17, 7, 9, 7, 9, 9, 32+3, 18, 21, _ON }, /*10 EN+ES/CS */ { 96+1, 96+2, 4, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 10, 128+7, 96+3, 18, 21, _EN }, /*11 EN+ET */ { 32+1, 32+2, 4, 32+5, 32+7, 32+15, 32+17, 32+7, 11, 32+7, 11, 11, 32+3, 18, 21, _EN }, /*12 AN+CS */ { 96+1, 96+2, 96+4, 5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 12, 128+7, 96+3, 96+18, 96+21, _AN }, /*13 AL:EN/AN+CS */ { 96+1, 96+2, 6, 6, 128+8, 96+16, 96+17, 128+8, 128+8, 128+8, 13, 128+8, 96+3, 18, 21, _AN }, /*14 ON+ET */ { 32+1, 32+2, 128+4, 32+5, 7, 32+15, 32+17, 7, 14, 7, 14, 14, 32+3,128+18,128+21, _ON }, /*15 S */ { 32+1, 32+2, 32+4, 32+5, 32+7, 15, 32+17, 32+7, 32+9, 32+7, 15, 32+7, 32+3, 32+18, 32+21, _S }, /*16 AL:S */ { 32+1, 32+2, 32+6, 32+6, 32+8, 16, 32+17, 32+8, 32+8, 32+8, 16, 32+8, 32+3, 32+18, 32+21, _S }, /*17 B */ { 32+1, 32+2, 32+4, 32+5, 32+7, 32+15, 17, 32+7, 32+9, 32+7, 17, 32+7, 32+3, 32+18, 32+21, _B }, /*18 ENL */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 64+19, 20, 64+19, 18, 18, 32+3, 18, 21, _L }, /*19 ENL+ES/CS */ { 96+1, 96+2, 18, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 19, 128+7, 96+3, 18, 21, _L }, /*20 ENL+ET */ { 32+1, 32+2, 18, 32+5, 32+7, 32+15, 32+17, 32+7, 20, 32+7, 20, 20, 32+3, 18, 21, _L }, /*21 ENR */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 64+22, 23, 64+22, 21, 21, 32+3, 18, 21, _AN }, /*22 ENR+ES/CS */ { 96+1, 96+2, 21, 96+5, 128+7, 96+15, 96+17, 128+7,128+14, 128+7, 22, 128+7, 96+3, 18, 21, _AN }, /*23 ENR+ET */ { 32+1, 32+2, 21, 32+5, 32+7, 32+15, 32+17, 32+7, 23, 32+7, 23, 23, 32+3, 18, 21, _AN } }; /*********************************************************************/ /* The levels state machine tables */ /*********************************************************************/ /* */ /* All table cells are 8 bits: */ /* bits 0..3: next state */ /* bits 4..7: action to perform (if > 0) */ /* */ /* Cells may be of format "n" where n represents the next state */ /* (except for the rightmost column). */ /* Cells may also be of format "_(x,y)" where x represents an action */ /* to perform and y represents the next state. */ /* */ /* This format limits each table to 16 states each and to 15 actions.*/ /* */ /*********************************************************************/ /* Definitions and type for levels state tables */ /*********************************************************************/ private static final int IMPTABLEVELS_COLUMNS = _B + 2; private static final int IMPTABLEVELS_RES = IMPTABLEVELS_COLUMNS - 1; private static short GetState(byte cell) { return (short)(cell & 0x0f); } private static short GetAction(byte cell) { return (short)(cell >> 4); } private static class ImpTabPair { byte[][][] imptab; short[][] impact; ImpTabPair(byte[][] table1, byte[][] table2, short[] act1, short[] act2) { imptab = new byte[][][] {table1, table2}; impact = new short[][] {act1, act2}; } } /*********************************************************************/ /* */ /* LEVELS STATE TABLES */ /* */ /* In all levels state tables, */ /* - state 0 is the initial state */ /* - the Res column is the increment to add to the text level */ /* for this property sequence. */ /* */ /* The impact arrays for each table of a pair map the local action */ /* numbers of the table to the total list of actions. For instance, */ /* action 2 in a given table corresponds to the action number which */ /* appears in entry [2] of the impact array for that table. */ /* The first entry of all impact arrays must be 0. */ /* */ /* Action 1: init conditional sequence */ /* 2: prepend conditional sequence to current sequence */ /* 3: set ON sequence to new level - 1 */ /* 4: init EN/AN/ON sequence */ /* 5: fix EN/AN/ON sequence followed by R */ /* 6: set previous level sequence to level 2 */ /* */ /* Notes: */ /* 1) These tables are used in processPropertySeq(). The input */ /* is property sequences as determined by resolveImplicitLevels. */ /* 2) Most such property sequences are processed immediately */ /* (levels are assigned). */ /* 3) However, some sequences cannot be assigned a final level till */ /* one or more following sequences are received. For instance, */ /* ON following an R sequence within an even-level paragraph. */ /* If the following sequence is R, the ON sequence will be */ /* assigned basic run level+1, and so will the R sequence. */ /* 4) S is generally handled like ON, since its level will be fixed */ /* to paragraph level in adjustWSLevels(). */ /* */ private static final byte impTabL_DEFAULT[][] = /* Even paragraph level */ /* In this table, conditional sequences receive the lower possible level until proven otherwise. */ { /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 0, 1, 0, 2, 0, 0, 0, 0 }, /* 1 : R */ { 0, 1, 3, 3, 0x14, 0x14, 0, 1 }, /* 2 : AN */ { 0, 1, 0, 2, 0x15, 0x15, 0, 2 }, /* 3 : R+EN/AN */ { 0, 1, 3, 3, 0x14, 0x14, 0, 2 }, /* 4 : R+ON */ { 0, 0x21, 0x33, 0x33, 4, 4, 0, 0 }, /* 5 : AN+ON */ { 0, 0x21, 0, 0x32, 5, 5, 0, 0 } }; private static final byte impTabR_DEFAULT[][] = /* Odd paragraph level */ /* In this table, conditional sequences receive the lower possible level until proven otherwise. */ { /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 }, /* 1 : L */ { 1, 0, 1, 3, 0x14, 0x14, 0, 1 }, /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 }, /* 3 : L+AN */ { 1, 0, 1, 3, 5, 5, 0, 1 }, /* 4 : L+ON */ { 0x21, 0, 0x21, 3, 4, 4, 0, 0 }, /* 5 : L+AN+ON */ { 1, 0, 1, 3, 5, 5, 0, 0 } }; private static final short[] impAct0 = {0,1,2,3,4}; private static final ImpTabPair impTab_DEFAULT = new ImpTabPair( impTabL_DEFAULT, impTabR_DEFAULT, impAct0, impAct0); private static final byte impTabL_NUMBERS_SPECIAL[][] = { /* Even paragraph level */ /* In this table, conditional sequences receive the lower possible level until proven otherwise. */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 0, 2, 0x11, 0x11, 0, 0, 0, 0 }, /* 1 : L+EN/AN */ { 0, 0x42, 1, 1, 0, 0, 0, 0 }, /* 2 : R */ { 0, 2, 4, 4, 0x13, 0x13, 0, 1 }, /* 3 : R+ON */ { 0, 0x22, 0x34, 0x34, 3, 3, 0, 0 }, /* 4 : R+EN/AN */ { 0, 2, 4, 4, 0x13, 0x13, 0, 2 } }; private static final ImpTabPair impTab_NUMBERS_SPECIAL = new ImpTabPair( impTabL_NUMBERS_SPECIAL, impTabR_DEFAULT, impAct0, impAct0); private static final byte impTabL_GROUP_NUMBERS_WITH_R[][] = { /* In this table, EN/AN+ON sequences receive levels as if associated with R until proven that there is L or sor/eor on both sides. AN is handled like EN. */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 init */ { 0, 3, 0x11, 0x11, 0, 0, 0, 0 }, /* 1 EN/AN */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 2 }, /* 2 EN/AN+ON */ { 0x20, 3, 1, 1, 2, 0x20, 0x20, 1 }, /* 3 R */ { 0, 3, 5, 5, 0x14, 0, 0, 1 }, /* 4 R+ON */ { 0x20, 3, 5, 5, 4, 0x20, 0x20, 1 }, /* 5 R+EN/AN */ { 0, 3, 5, 5, 0x14, 0, 0, 2 } }; private static final byte impTabR_GROUP_NUMBERS_WITH_R[][] = { /* In this table, EN/AN+ON sequences receive levels as if associated with R until proven that there is L on both sides. AN is handled like EN. */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 init */ { 2, 0, 1, 1, 0, 0, 0, 0 }, /* 1 EN/AN */ { 2, 0, 1, 1, 0, 0, 0, 1 }, /* 2 L */ { 2, 0, 0x14, 0x14, 0x13, 0, 0, 1 }, /* 3 L+ON */ { 0x22, 0, 4, 4, 3, 0, 0, 0 }, /* 4 L+EN/AN */ { 0x22, 0, 4, 4, 3, 0, 0, 1 } }; private static final ImpTabPair impTab_GROUP_NUMBERS_WITH_R = new ImpTabPair(impTabL_GROUP_NUMBERS_WITH_R, impTabR_GROUP_NUMBERS_WITH_R, impAct0, impAct0); private static final byte impTabL_INVERSE_NUMBERS_AS_L[][] = { /* This table is identical to the Default LTR table except that EN and AN are handled like L. */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 0, 1, 0, 0, 0, 0, 0, 0 }, /* 1 : R */ { 0, 1, 0, 0, 0x14, 0x14, 0, 1 }, /* 2 : AN */ { 0, 1, 0, 0, 0x15, 0x15, 0, 2 }, /* 3 : R+EN/AN */ { 0, 1, 0, 0, 0x14, 0x14, 0, 2 }, /* 4 : R+ON */ { 0x20, 1, 0x20, 0x20, 4, 4, 0x20, 1 }, /* 5 : AN+ON */ { 0x20, 1, 0x20, 0x20, 5, 5, 0x20, 1 } }; private static final byte impTabR_INVERSE_NUMBERS_AS_L[][] = { /* This table is identical to the Default RTL table except that EN and AN are handled like L. */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 1, 0, 1, 1, 0, 0, 0, 0 }, /* 1 : L */ { 1, 0, 1, 1, 0x14, 0x14, 0, 1 }, /* 2 : EN/AN */ { 1, 0, 1, 1, 0, 0, 0, 1 }, /* 3 : L+AN */ { 1, 0, 1, 1, 5, 5, 0, 1 }, /* 4 : L+ON */ { 0x21, 0, 0x21, 0x21, 4, 4, 0, 0 }, /* 5 : L+AN+ON */ { 1, 0, 1, 1, 5, 5, 0, 0 } }; private static final ImpTabPair impTab_INVERSE_NUMBERS_AS_L = new ImpTabPair (impTabL_INVERSE_NUMBERS_AS_L, impTabR_INVERSE_NUMBERS_AS_L, impAct0, impAct0); private static final byte impTabR_INVERSE_LIKE_DIRECT[][] = { /* Odd paragraph level */ /* In this table, conditional sequences receive the lower possible level until proven otherwise. */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 1, 0, 2, 2, 0, 0, 0, 0 }, /* 1 : L */ { 1, 0, 1, 2, 0x13, 0x13, 0, 1 }, /* 2 : EN/AN */ { 1, 0, 2, 2, 0, 0, 0, 1 }, /* 3 : L+ON */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 0 }, /* 4 : L+ON+AN */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 3 }, /* 5 : L+AN+ON */ { 0x21, 0x30, 6, 4, 5, 5, 0x30, 2 }, /* 6 : L+ON+EN */ { 0x21, 0x30, 6, 4, 3, 3, 0x30, 1 } }; private static final short[] impAct1 = {0,1,13,14}; private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT = new ImpTabPair( impTabL_DEFAULT, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1); private static final byte impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = { /* The case handled in this table is (visually): R EN L */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 0, 0x63, 0, 1, 0, 0, 0, 0 }, /* 1 : L+AN */ { 0, 0x63, 0, 1, 0x12, 0x30, 0, 4 }, /* 2 : L+AN+ON */ { 0x20, 0x63, 0x20, 1, 2, 0x30, 0x20, 3 }, /* 3 : R */ { 0, 0x63, 0x55, 0x56, 0x14, 0x30, 0, 3 }, /* 4 : R+ON */ { 0x30, 0x43, 0x55, 0x56, 4, 0x30, 0x30, 3 }, /* 5 : R+EN */ { 0x30, 0x43, 5, 0x56, 0x14, 0x30, 0x30, 4 }, /* 6 : R+AN */ { 0x30, 0x43, 0x55, 6, 0x14, 0x30, 0x30, 4 } }; private static final byte impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS[][] = { /* The cases handled in this table are (visually): R EN L R L AN L */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 0x13, 0, 1, 1, 0, 0, 0, 0 }, /* 1 : R+EN/AN */ { 0x23, 0, 1, 1, 2, 0x40, 0, 1 }, /* 2 : R+EN/AN+ON */ { 0x23, 0, 1, 1, 2, 0x40, 0, 0 }, /* 3 : L */ { 3, 0, 3, 0x36, 0x14, 0x40, 0, 1 }, /* 4 : L+ON */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 0 }, /* 5 : L+ON+EN */ { 0x53, 0x40, 5, 0x36, 4, 0x40, 0x40, 1 }, /* 6 : L+AN */ { 0x53, 0x40, 6, 6, 4, 0x40, 0x40, 3 } }; private static final short[] impAct2 = {0,1,2,5,6,7,8}; private static final short[] impAct3 = {0,1,9,10,11,12}; private static final ImpTabPair impTab_INVERSE_LIKE_DIRECT_WITH_MARKS = new ImpTabPair(impTabL_INVERSE_LIKE_DIRECT_WITH_MARKS, impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct2, impAct3); private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL = new ImpTabPair( impTabL_NUMBERS_SPECIAL, impTabR_INVERSE_LIKE_DIRECT, impAct0, impAct1); private static final byte impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS[][] = { /* The case handled in this table is (visually): R EN L */ /* L, R, EN, AN, ON, S, B, Res */ /* 0 : init */ { 0, 0x62, 1, 1, 0, 0, 0, 0 }, /* 1 : L+EN/AN */ { 0, 0x62, 1, 1, 0, 0x30, 0, 4 }, /* 2 : R */ { 0, 0x62, 0x54, 0x54, 0x13, 0x30, 0, 3 }, /* 3 : R+ON */ { 0x30, 0x42, 0x54, 0x54, 3, 0x30, 0x30, 3 }, /* 4 : R+EN/AN */ { 0x30, 0x42, 4, 4, 0x13, 0x30, 0x30, 4 } }; private static final ImpTabPair impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS = new ImpTabPair(impTabL_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS, impTabR_INVERSE_LIKE_DIRECT_WITH_MARKS, impAct2, impAct3); private static class LevState { byte[][] impTab; /* level table pointer */ short[] impAct; /* action map array */ int startON; /* start of ON sequence */ int startL2EN; /* start of level 2 sequence */ int lastStrongRTL; /* index of last found R or AL */ int runStart; /* start position of the run */ short state; /* current state */ byte runLevel; /* run level before implicit solving */ } /*------------------------------------------------------------------------*/ static final int FIRSTALLOC = 10; /* * param pos: position where to insert * param flag: one of LRM_BEFORE, LRM_AFTER, RLM_BEFORE, RLM_AFTER */ private void addPoint(int pos, int flag) { Point point = new Point(); int len = insertPoints.points.length; if (len == 0) { insertPoints.points = new Point[FIRSTALLOC]; len = FIRSTALLOC; } if (insertPoints.size >= len) { /* no room for new point */ Point[] savePoints = insertPoints.points; insertPoints.points = new Point[len * 2]; System.arraycopy(savePoints, 0, insertPoints.points, 0, len); } point.pos = pos; point.flag = flag; insertPoints.points[insertPoints.size] = point; insertPoints.size++; } private void setLevelsOutsideIsolates(int start, int limit, byte level) { byte dirProp; int isolateCount = 0, k; for (k = start; k < limit; k++) { dirProp = dirProps[k]; if (dirProp == PDI) isolateCount--; if (isolateCount == 0) levels[k] = level; if (dirProp == LRI || dirProp == RLI) isolateCount++; } } /* perform rules (Wn), (Nn), and (In) on a run of the text ------------------ */ /* * This implementation of the (Wn) rules applies all rules in one pass. * In order to do so, it needs a look-ahead of typically 1 character * (except for W5: sequences of ET) and keeps track of changes * in a rule Wp that affect a later Wq (p= 0) { addPoint(levState.startL2EN, LRM_BEFORE); } levState.startL2EN = -1; /* not within previous if since could also be -2 */ /* check if we had any relevant EN/AN after R/AL */ if ((insertPoints.points.length == 0) || (insertPoints.size <= insertPoints.confirmed)) { /* nothing, just clean up */ levState.lastStrongRTL = -1; /* check if we have a pending conditional segment */ level = impTab[oldStateSeq][IMPTABLEVELS_RES]; if ((level & 1) != 0 && levState.startON > 0) { /* after ON */ start = levState.startON; /* reset to basic run level */ } if (_prop == _S) { /* add LRM before S */ addPoint(start0, LRM_BEFORE); insertPoints.confirmed = insertPoints.size; } break; } /* reset previous RTL cont to level for LTR text */ for (k = levState.lastStrongRTL + 1; k < start0; k++) { /* reset odd level, leave runLevel+2 as is */ levels[k] = (byte)((levels[k] - 2) & ~1); } /* mark insert points as confirmed */ insertPoints.confirmed = insertPoints.size; levState.lastStrongRTL = -1; if (_prop == _S) { /* add LRM before S */ addPoint(start0, LRM_BEFORE); insertPoints.confirmed = insertPoints.size; } break; case 6: /* R/AL after possible relevant EN/AN */ /* just clean up */ if (insertPoints.points.length > 0) /* remove all non confirmed insert points */ insertPoints.size = insertPoints.confirmed; levState.startON = -1; levState.startL2EN = -1; levState.lastStrongRTL = limit - 1; break; case 7: /* EN/AN after R/AL + possible cont */ /* check for real AN */ if ((_prop == _AN) && (dirProps[start0] == AN) && (reorderingMode != REORDER_INVERSE_FOR_NUMBERS_SPECIAL)) { /* real AN */ if (levState.startL2EN == -1) { /* if no relevant EN already found */ /* just note the rightmost digit as a strong RTL */ levState.lastStrongRTL = limit - 1; break; } if (levState.startL2EN >= 0) { /* after EN, no AN */ addPoint(levState.startL2EN, LRM_BEFORE); levState.startL2EN = -2; } /* note AN */ addPoint(start0, LRM_BEFORE); break; } /* if first EN/AN after R/AL */ if (levState.startL2EN == -1) { levState.startL2EN = start0; } break; case 8: /* note location of latest R/AL */ levState.lastStrongRTL = limit - 1; levState.startON = -1; break; case 9: /* L after R+ON/EN/AN */ /* include possible adjacent number on the left */ for (k = start0-1; k >= 0 && ((levels[k] & 1) == 0); k--) { } if (k >= 0) { addPoint(k, RLM_BEFORE); /* add RLM before */ insertPoints.confirmed = insertPoints.size; /* confirm it */ } levState.startON = start0; break; case 10: /* AN after L */ /* AN numbers between L text on both sides may be trouble. */ /* tentatively bracket with LRMs; will be confirmed if followed by L */ addPoint(start0, LRM_BEFORE); /* add LRM before */ addPoint(start0, LRM_AFTER); /* add LRM after */ break; case 11: /* R after L+ON/EN/AN */ /* false alert, infirm LRMs around previous AN */ insertPoints.size=insertPoints.confirmed; if (_prop == _S) { /* add RLM before S */ addPoint(start0, RLM_BEFORE); insertPoints.confirmed = insertPoints.size; } break; case 12: /* L after L+ON/AN */ level = (byte)(levState.runLevel + addLevel); for (k=levState.startON; k < start0; k++) { if (levels[k] < level) { levels[k] = level; } } insertPoints.confirmed = insertPoints.size; /* confirm inserts */ levState.startON = start0; break; case 13: /* L after L+ON+EN/AN/ON */ level = levState.runLevel; for (k = start0-1; k >= levState.startON; k--) { if (levels[k] == level+3) { while (levels[k] == level+3) { levels[k--] -= 2; } while (levels[k] == level) { k--; } } if (levels[k] == level+2) { levels[k] = level; continue; } levels[k] = (byte)(level+1); } break; case 14: /* R after L+ON+EN/AN/ON */ level = (byte)(levState.runLevel+1); for (k = start0-1; k >= levState.startON; k--) { if (levels[k] > level) { levels[k] -= 2; } } break; default: /* we should never get here */ throw new IllegalStateException("Internal ICU error in processPropertySeq"); } } if ((addLevel) != 0 || (start < start0)) { level = (byte)(levState.runLevel + addLevel); if (start >= levState.runStart) { for (k = start; k < limit; k++) { levels[k] = level; } } else { setLevelsOutsideIsolates(start, limit, level); } } } /** * Returns the directionality of the last strong character at the end of the prologue, if any. * Requires prologue!=null. */ private byte lastL_R_AL() { for (int i = prologue.length(); i > 0; ) { int uchar = prologue.codePointBefore(i); i -= Character.charCount(uchar); byte dirProp = (byte)getCustomizedClass(uchar); if (dirProp == L) { return _L; } if (dirProp == R || dirProp == AL) { return _R; } if(dirProp == B) { return _ON; } } return _ON; } /** * Returns the directionality of the first strong character, or digit, in the epilogue, if any. * Requires epilogue!=null. */ private byte firstL_R_AL_EN_AN() { for (int i = 0; i < epilogue.length(); ) { int uchar = epilogue.codePointAt(i); i += Character.charCount(uchar); byte dirProp = (byte)getCustomizedClass(uchar); if (dirProp == L) { return _L; } if (dirProp == R || dirProp == AL) { return _R; } if (dirProp == EN) { return _EN; } if (dirProp == AN) { return _AN; } } return _ON; } private void resolveImplicitLevels(int start, int limit, short sor, short eor) { byte dirProp; LevState levState = new LevState(); int i, start1, start2; short oldStateImp, stateImp, actionImp; short gprop, resProp, cell; boolean inverseRTL; short nextStrongProp = R; int nextStrongPos = -1; /* check for RTL inverse Bidi mode */ /* FOOD FOR THOUGHT: in case of RTL inverse Bidi, it would make sense to * loop on the text characters from end to start. * This would need a different properties state table (at least different * actions) and different levels state tables (maybe very similar to the * LTR corresponding ones. */ inverseRTL=((start0) && (reorderingMode == REORDER_INVERSE_LIKE_DIRECT || reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL)); /* initialize for property and levels state table */ levState.startL2EN = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */ levState.lastStrongRTL = -1; /* used for INVERSE_LIKE_DIRECT_WITH_MARKS */ levState.runStart = start; levState.runLevel = levels[start]; levState.impTab = impTabPair.imptab[levState.runLevel & 1]; levState.impAct = impTabPair.impact[levState.runLevel & 1]; if (start == 0 && prologue != null) { byte lastStrong = lastL_R_AL(); if (lastStrong != _ON) { sor = lastStrong; } } /* The isolates[] entries contain enough information to resume the bidi algorithm in the same state as it was when it was interrupted by an isolate sequence. */ if (dirProps[start] == PDI) { levState.startON = isolates[isolateCount].startON; start1 = isolates[isolateCount].start1; stateImp = isolates[isolateCount].stateImp; levState.state = isolates[isolateCount].state; isolateCount--; } else { levState.startON = -1; start1 = start; if (dirProps[start] == NSM) stateImp = (short)(1 + sor); else stateImp = 0; levState.state = 0; processPropertySeq(levState, sor, start, start); } start2 = start; /* to make the Java compiler happy */ for (i = start; i <= limit; i++) { if (i >= limit) { int k; for (k = limit - 1; k > start && (DirPropFlag(dirProps[k]) & MASK_BN_EXPLICIT) != 0; k--); dirProp = dirProps[k]; if (dirProp == LRI || dirProp == RLI) break; /* no forced closing for sequence ending with LRI/RLI */ gprop = eor; } else { byte prop, prop1; prop = dirProps[i]; if (prop == B) isolateCount = -1; /* current isolates stack entry == none */ if (inverseRTL) { if (prop == AL) { /* AL before EN does not make it AN */ prop = R; } else if (prop == EN) { if (nextStrongPos <= i) { /* look for next strong char (L/R/AL) */ int j; nextStrongProp = R; /* set default */ nextStrongPos = limit; for (j = i+1; j < limit; j++) { prop1 = dirProps[j]; if (prop1 == L || prop1 == R || prop1 == AL) { nextStrongProp = prop1; nextStrongPos = j; break; } } } if (nextStrongProp == AL) { prop = AN; } } } gprop = groupProp[prop]; } oldStateImp = stateImp; cell = impTabProps[oldStateImp][gprop]; stateImp = GetStateProps(cell); /* isolate the new state */ actionImp = GetActionProps(cell); /* isolate the action */ if ((i == limit) && (actionImp == 0)) { /* there is an unprocessed sequence if its property == eor */ actionImp = 1; /* process the last sequence */ } if (actionImp != 0) { resProp = impTabProps[oldStateImp][IMPTABPROPS_RES]; switch (actionImp) { case 1: /* process current seq1, init new seq1 */ processPropertySeq(levState, resProp, start1, i); start1 = i; break; case 2: /* init new seq2 */ start2 = i; break; case 3: /* process seq1, process seq2, init new seq1 */ processPropertySeq(levState, resProp, start1, start2); processPropertySeq(levState, _ON, start2, i); start1 = i; break; case 4: /* process seq1, set seq1=seq2, init new seq2 */ processPropertySeq(levState, resProp, start1, start2); start1 = start2; start2 = i; break; default: /* we should never get here */ throw new IllegalStateException("Internal ICU error in resolveImplicitLevels"); } } } /* flush possible pending sequence, e.g. ON */ if (limit == length && epilogue != null) { byte firstStrong = firstL_R_AL_EN_AN(); if (firstStrong != _ON) { eor = firstStrong; } } /* look for the last char not a BN or LRE/RLE/LRO/RLO/PDF */ for (i = limit - 1; i > start && (DirPropFlag(dirProps[i]) & MASK_BN_EXPLICIT) != 0; i--); dirProp = dirProps[i]; if ((dirProp == LRI || dirProp == RLI) && limit < length) { isolateCount++; if (isolates[isolateCount] == null) isolates[isolateCount] = new Isolate(); isolates[isolateCount].stateImp = stateImp; isolates[isolateCount].state = levState.state; isolates[isolateCount].start1 = start1; isolates[isolateCount].startON = levState.startON; } else processPropertySeq(levState, eor, limit, limit); } /* perform (L1) and (X9) ---------------------------------------------------- */ /* * Reset the embedding levels for some non-graphic characters (L1). * This method also sets appropriate levels for BN, and * explicit embedding types that are supposed to have been removed * from the paragraph in (X9). */ private void adjustWSLevels() { int i; if ((flags & MASK_WS) != 0) { int flag; i = trailingWSStart; while (i > 0) { /* reset a sequence of WS/BN before eop and B/S to the paragraph paraLevel */ while (i > 0 && ((flag = DirPropFlag(dirProps[--i])) & MASK_WS) != 0) { if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) { levels[i] = 0; } else { levels[i] = GetParaLevelAt(i); } } /* reset BN to the next character's paraLevel until B/S, which restarts above loop */ /* here, i+1 is guaranteed to be 0) { flag = DirPropFlag(dirProps[--i]); if ((flag & MASK_BN_EXPLICIT) != 0) { levels[i] = levels[i + 1]; } else if (orderParagraphsLTR && (flag & DirPropFlag(B)) != 0) { levels[i] = 0; break; } else if ((flag & MASK_B_S) != 0){ levels[i] = GetParaLevelAt(i); break; } } } } } /** * Set the context before a call to setPara().

* * setPara() computes the left-right directionality for a given piece * of text which is supplied as one of its arguments. Sometimes this piece * of text (the "main text") should be considered in context, because text * appearing before ("prologue") and/or after ("epilogue") the main text * may affect the result of this computation.

* * This function specifies the prologue and/or the epilogue for the next * call to setPara(). If successive calls to setPara() * all need specification of a context, setContext() must be called * before each call to setPara(). In other words, a context is not * "remembered" after the following successful call to setPara().

* * If a call to setPara() specifies DEFAULT_LTR or * DEFAULT_RTL as paraLevel and is preceded by a call to * setContext() which specifies a prologue, the paragraph level will * be computed taking in consideration the text in the prologue.

* * When setPara() is called without a previous call to * setContext, the main text is handled as if preceded and followed * by strong directional characters at the current paragraph level. * Calling setContext() with specification of a prologue will change * this behavior by handling the main text as if preceded by the last * strong character appearing in the prologue, if any. * Calling setContext() with specification of an epilogue will change * the behavior of setPara() by handling the main text as if followed * by the first strong character or digit appearing in the epilogue, if any.

* * Note 1: if setContext is called repeatedly without * calling setPara, the earlier calls have no effect, * only the last call will be remembered for the next call to * setPara.

* * Note 2: calling setContext(null, null) * cancels any previous setting of non-empty prologue or epilogue. * The next call to setPara() will process no * prologue or epilogue.

* * Note 3: users must be aware that even after setting the context * before a call to setPara() to perform e.g. a logical to visual * transformation, the resulting string may not be identical to what it * would have been if all the text, including prologue and epilogue, had * been processed together.
* Example (upper case letters represent RTL characters):
*   prologue = "abc DE"
*   epilogue = none
*   main text = "FGH xyz"
*   paraLevel = LTR
*   display without prologue = "HGF xyz" * ("HGF" is adjacent to "xyz")
*   display with prologue = "abc HGFED xyz" * ("HGF" is not adjacent to "xyz")
* * @param prologue is the text which precedes the text that * will be specified in a coming call to setPara(). * If there is no prologue to consider, * this parameter can be null. * * @param epilogue is the text which follows the text that * will be specified in a coming call to setPara(). * If there is no epilogue to consider, * this parameter can be null. * * @see #setPara * @stable ICU 4.8 */ public void setContext(String prologue, String epilogue) { this.prologue = prologue != null && prologue.length() > 0 ? prologue : null; this.epilogue = epilogue != null && epilogue.length() > 0 ? epilogue : null; } private void setParaSuccess() { prologue = null; /* forget the last context */ epilogue = null; paraBidi = this; /* mark successful setPara */ } int Bidi_Min(int x, int y) { return x < y ? x : y; } int Bidi_Abs(int x) { return x >= 0 ? x : -x; } void setParaRunsOnly(char[] parmText, byte parmParaLevel) { int[] visualMap; String visualText; int saveLength, saveTrailingWSStart; byte[] saveLevels; byte saveDirection; int i, j, visualStart, logicalStart, oldRunCount, runLength, addedRuns, insertRemove, start, limit, step, indexOddBit, logicalPos, index, index1; int saveOptions; reorderingMode = REORDER_DEFAULT; int parmLength = parmText.length; if (parmLength == 0) { setPara(parmText, parmParaLevel, null); reorderingMode = REORDER_RUNS_ONLY; return; } /* obtain memory for mapping table and visual text */ saveOptions = reorderingOptions; if ((saveOptions & OPTION_INSERT_MARKS) > 0) { reorderingOptions &= ~OPTION_INSERT_MARKS; reorderingOptions |= OPTION_REMOVE_CONTROLS; } parmParaLevel &= 1; /* accept only 0 or 1 */ setPara(parmText, parmParaLevel, null); /* we cannot access directly levels since it is not yet set if * direction is not MIXED */ saveLevels = new byte[this.length]; System.arraycopy(getLevels(), 0, saveLevels, 0, this.length); saveTrailingWSStart = trailingWSStart; /* FOOD FOR THOUGHT: instead of writing the visual text, we could use * the visual map and the dirProps array to drive the second call * to setPara (but must make provision for possible removal of * Bidi controls. Alternatively, only use the dirProps array via * customized classifier callback. */ visualText = writeReordered(DO_MIRRORING); visualMap = getVisualMap(); this.reorderingOptions = saveOptions; saveLength = this.length; saveDirection=this.direction; this.reorderingMode = REORDER_INVERSE_LIKE_DIRECT; parmParaLevel ^= 1; setPara(visualText, parmParaLevel, null); BidiLine.getRuns(this); /* check if some runs must be split, count how many splits */ addedRuns = 0; oldRunCount = this.runCount; visualStart = 0; for (i = 0; i < oldRunCount; i++, visualStart += runLength) { runLength = runs[i].limit - visualStart; if (runLength < 2) { continue; } logicalStart = runs[i].start; for (j = logicalStart+1; j < logicalStart+runLength; j++) { index = visualMap[j]; index1 = visualMap[j-1]; if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) { addedRuns++; } } } if (addedRuns > 0) { getRunsMemory(oldRunCount + addedRuns); if (runCount == 1) { /* because we switch from UBiDi.simpleRuns to UBiDi.runs */ runsMemory[0] = runs[0]; } else { System.arraycopy(runs, 0, runsMemory, 0, runCount); } runs = runsMemory; runCount += addedRuns; for (i = oldRunCount; i < runCount; i++) { if (runs[i] == null) { runs[i] = new BidiRun(0, 0, (byte)0); } } } /* split runs which are not consecutive in source text */ int newI; for (i = oldRunCount-1; i >= 0; i--) { newI = i + addedRuns; runLength = i==0 ? runs[0].limit : runs[i].limit - runs[i-1].limit; logicalStart = runs[i].start; indexOddBit = runs[i].level & 1; if (runLength < 2) { if (addedRuns > 0) { runs[newI].copyFrom(runs[i]); } logicalPos = visualMap[logicalStart]; runs[newI].start = logicalPos; runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit); continue; } if (indexOddBit > 0) { start = logicalStart; limit = logicalStart + runLength - 1; step = 1; } else { start = logicalStart + runLength - 1; limit = logicalStart; step = -1; } for (j = start; j != limit; j += step) { index = visualMap[j]; index1 = visualMap[j+step]; if ((Bidi_Abs(index-index1)!=1) || (saveLevels[index]!=saveLevels[index1])) { logicalPos = Bidi_Min(visualMap[start], index); runs[newI].start = logicalPos; runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit); runs[newI].limit = runs[i].limit; runs[i].limit -= Bidi_Abs(j - start) + 1; insertRemove = runs[i].insertRemove & (LRM_AFTER|RLM_AFTER); runs[newI].insertRemove = insertRemove; runs[i].insertRemove &= ~insertRemove; start = j + step; addedRuns--; newI--; } } if (addedRuns > 0) { runs[newI].copyFrom(runs[i]); } logicalPos = Bidi_Min(visualMap[start], visualMap[limit]); runs[newI].start = logicalPos; runs[newI].level = (byte)(saveLevels[logicalPos] ^ indexOddBit); } // cleanup1: /* restore initial paraLevel */ this.paraLevel ^= 1; // cleanup2: /* restore real text */ this.text = parmText; this.length = saveLength; this.originalLength = parmLength; this.direction=saveDirection; this.levels = saveLevels; this.trailingWSStart = saveTrailingWSStart; if (runCount > 1) { this.direction = MIXED; } // cleanup3: this.reorderingMode = REORDER_RUNS_ONLY; } /** * Perform the Unicode Bidi algorithm. It is defined in the * Unicode Standard Annex #9. * *

This method takes a piece of plain text containing one or more paragraphs, * with or without externally specified embedding levels from styled * text and computes the left-right-directionality of each character.

* *

If the entire text is all of the same directionality, then * the method may not perform all the steps described by the algorithm, * i.e., some levels may not be the same as if all steps were performed. * This is not relevant for unidirectional text.
* For example, in pure LTR text with numbers the numbers would get * a resolved level of 2 higher than the surrounding text according to * the algorithm. This implementation may set all resolved levels to * the same value in such a case.

* *

The text can be composed of multiple paragraphs. Occurrence of a block * separator in the text terminates a paragraph, and whatever comes next starts * a new paragraph. The exception to this rule is when a Carriage Return (CR) * is followed by a Line Feed (LF). Both CR and LF are block separators, but * in that case, the pair of characters is considered as terminating the * preceding paragraph, and a new paragraph will be started by a character * coming after the LF. * *

Although the text is passed here as a String, it is * stored internally as an array of characters. Therefore the * documentation will refer to indexes of the characters in the text. * * @param text contains the text that the Bidi algorithm will be performed * on. This text can be retrieved with getText() or * getTextAsString.
* * @param paraLevel specifies the default level for the text; * it is typically 0 (LTR) or 1 (RTL). * If the method shall determine the paragraph level from the text, * then paraLevel can be set to * either LEVEL_DEFAULT_LTR * or LEVEL_DEFAULT_RTL; if the text contains multiple * paragraphs, the paragraph level shall be determined separately for * each paragraph; if a paragraph does not include any strongly typed * character, then the desired default is used (0 for LTR or 1 for RTL). * Any other value between 0 and MAX_EXPLICIT_LEVEL * is also valid, with odd levels indicating RTL. * * @param embeddingLevels (in) may be used to preset the embedding and override levels, * ignoring characters like LRE and PDF in the text. * A level overrides the directional property of its corresponding * (same index) character if the level has the * LEVEL_OVERRIDE bit set.

* Aside from that bit, it must be * paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL, * except that level 0 is always allowed. * Level 0 for a paragraph separator prevents reordering of paragraphs; * this only works reliably if LEVEL_OVERRIDE * is also set for paragraph separators. * Level 0 for other characters is treated as a wildcard * and is lifted up to the resolved level of the surrounding paragraph.

* Caution: A reference to this array, not a copy * of the levels, will be stored in the Bidi object; * the embeddingLevels * should not be modified to avoid unexpected results on subsequent * Bidi operations. However, the setPara() and * setLine() methods may modify some or all of the * levels.

* Note: the embeddingLevels array must * have one entry for each character in text. * * @throws IllegalArgumentException if the values in embeddingLevels are * not within the allowed range * * @see #LEVEL_DEFAULT_LTR * @see #LEVEL_DEFAULT_RTL * @see #LEVEL_OVERRIDE * @see #MAX_EXPLICIT_LEVEL * @stable ICU 3.8 */ public void setPara(String text, byte paraLevel, byte[] embeddingLevels) { if (text == null) { setPara(new char[0], paraLevel, embeddingLevels); } else { setPara(text.toCharArray(), paraLevel, embeddingLevels); } } /** * Perform the Unicode Bidi algorithm. It is defined in the * Unicode Standard Annex #9. * *

This method takes a piece of plain text containing one or more paragraphs, * with or without externally specified embedding levels from styled * text and computes the left-right-directionality of each character.

* *

If the entire text is all of the same directionality, then * the method may not perform all the steps described by the algorithm, * i.e., some levels may not be the same as if all steps were performed. * This is not relevant for unidirectional text.
* For example, in pure LTR text with numbers the numbers would get * a resolved level of 2 higher than the surrounding text according to * the algorithm. This implementation may set all resolved levels to * the same value in such a case.

* *

The text can be composed of multiple paragraphs. Occurrence of a block * separator in the text terminates a paragraph, and whatever comes next starts * a new paragraph. The exception to this rule is when a Carriage Return (CR) * is followed by a Line Feed (LF). Both CR and LF are block separators, but * in that case, the pair of characters is considered as terminating the * preceding paragraph, and a new paragraph will be started by a character * coming after the LF. * *

The text is stored internally as an array of characters. Therefore the * documentation will refer to indexes of the characters in the text. * * @param chars contains the text that the Bidi algorithm will be performed * on. This text can be retrieved with getText() or * getTextAsString.
* * @param paraLevel specifies the default level for the text; * it is typically 0 (LTR) or 1 (RTL). * If the method shall determine the paragraph level from the text, * then paraLevel can be set to * either LEVEL_DEFAULT_LTR * or LEVEL_DEFAULT_RTL; if the text contains multiple * paragraphs, the paragraph level shall be determined separately for * each paragraph; if a paragraph does not include any strongly typed * character, then the desired default is used (0 for LTR or 1 for RTL). * Any other value between 0 and MAX_EXPLICIT_LEVEL * is also valid, with odd levels indicating RTL. * * @param embeddingLevels (in) may be used to preset the embedding and * override levels, ignoring characters like LRE and PDF in the text. * A level overrides the directional property of its corresponding * (same index) character if the level has the * LEVEL_OVERRIDE bit set.

* Aside from that bit, it must be * paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL, * except that level 0 is always allowed. * Level 0 for a paragraph separator prevents reordering of paragraphs; * this only works reliably if LEVEL_OVERRIDE * is also set for paragraph separators. * Level 0 for other characters is treated as a wildcard * and is lifted up to the resolved level of the surrounding paragraph.

* Caution: A reference to this array, not a copy * of the levels, will be stored in the Bidi object; * the embeddingLevels * should not be modified to avoid unexpected results on subsequent * Bidi operations. However, the setPara() and * setLine() methods may modify some or all of the * levels.

* Note: the embeddingLevels array must * have one entry for each character in text. * * @throws IllegalArgumentException if the values in embeddingLevels are * not within the allowed range * * @see #LEVEL_DEFAULT_LTR * @see #LEVEL_DEFAULT_RTL * @see #LEVEL_OVERRIDE * @see #MAX_EXPLICIT_LEVEL * @stable ICU 3.8 */ public void setPara(char[] chars, byte paraLevel, byte[] embeddingLevels) { /* check the argument values */ if (paraLevel < LEVEL_DEFAULT_LTR) { verifyRange(paraLevel, 0, MAX_EXPLICIT_LEVEL + 1); } if (chars == null) { chars = new char[0]; } /* special treatment for RUNS_ONLY mode */ if (reorderingMode == REORDER_RUNS_ONLY) { setParaRunsOnly(chars, paraLevel); return; } /* initialize the Bidi object */ this.paraBidi = null; /* mark unfinished setPara */ this.text = chars; this.length = this.originalLength = this.resultLength = text.length; this.paraLevel = paraLevel; this.direction = (byte)(paraLevel & 1); this.paraCount = 1; /* Allocate zero-length arrays instead of setting to null here; then * checks for null in various places can be eliminated. */ dirProps = new byte[0]; levels = new byte[0]; runs = new BidiRun[0]; isGoodLogicalToVisualRunsMap = false; insertPoints.size = 0; /* clean up from last call */ insertPoints.confirmed = 0; /* clean up from last call */ /* * Save the original paraLevel if contextual; otherwise, set to 0. */ defaultParaLevel = IsDefaultLevel(paraLevel) ? paraLevel : 0; if (length == 0) { /* * For an empty paragraph, create a Bidi object with the paraLevel and * the flags and the direction set but without allocating zero-length arrays. * There is nothing more to do. */ if (IsDefaultLevel(paraLevel)) { this.paraLevel &= 1; defaultParaLevel = 0; } flags = DirPropFlagLR(paraLevel); runCount = 0; paraCount = 0; setParaSuccess(); return; } runCount = -1; /* * Get the directional properties, * the flags bit-set, and * determine the paragraph level if necessary. */ getDirPropsMemory(length); dirProps = dirPropsMemory; getDirProps(); /* the processed length may have changed if OPTION_STREAMING is set */ trailingWSStart = length; /* the levels[] will reflect the WS run */ /* are explicit levels specified? */ if (embeddingLevels == null) { /* no: determine explicit levels according to the (Xn) rules */ getLevelsMemory(length); levels = levelsMemory; direction = resolveExplicitLevels(); } else { /* set BN for all explicit codes, check that all levels are 0 or paraLevel..MAX_EXPLICIT_LEVEL */ levels = embeddingLevels; direction = checkExplicitLevels(); } /* allocate isolate memory */ if (isolateCount > 0) { if (isolates == null || isolates.length < isolateCount) isolates = new Isolate[isolateCount + 3]; /* keep some reserve */ } isolateCount = -1; /* current isolates stack entry == none */ /* * The steps after (X9) in the Bidi algorithm are performed only if * the paragraph text has mixed directionality! */ switch (direction) { case LTR: /* all levels are implicitly at paraLevel (important for getLevels()) */ trailingWSStart = 0; break; case RTL: /* all levels are implicitly at paraLevel (important for getLevels()) */ trailingWSStart = 0; break; default: /* * Choose the right implicit state table */ switch(reorderingMode) { case REORDER_DEFAULT: this.impTabPair = impTab_DEFAULT; break; case REORDER_NUMBERS_SPECIAL: this.impTabPair = impTab_NUMBERS_SPECIAL; break; case REORDER_GROUP_NUMBERS_WITH_R: this.impTabPair = impTab_GROUP_NUMBERS_WITH_R; break; case REORDER_RUNS_ONLY: /* we should never get here */ throw new InternalError("Internal ICU error in setPara"); /* break; */ case REORDER_INVERSE_NUMBERS_AS_L: this.impTabPair = impTab_INVERSE_NUMBERS_AS_L; break; case REORDER_INVERSE_LIKE_DIRECT: if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) { this.impTabPair = impTab_INVERSE_LIKE_DIRECT_WITH_MARKS; } else { this.impTabPair = impTab_INVERSE_LIKE_DIRECT; } break; case REORDER_INVERSE_FOR_NUMBERS_SPECIAL: if ((reorderingOptions & OPTION_INSERT_MARKS) != 0) { this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL_WITH_MARKS; } else { this.impTabPair = impTab_INVERSE_FOR_NUMBERS_SPECIAL; } break; } /* * If there are no external levels specified and there * are no significant explicit level codes in the text, * then we can treat the entire paragraph as one run. * Otherwise, we need to perform the following rules on runs of * the text with the same embedding levels. (X10) * "Significant" explicit level codes are ones that actually * affect non-BN characters. * Examples for "insignificant" ones are empty embeddings * LRE-PDF, LRE-RLE-PDF-PDF, etc. */ if (embeddingLevels == null && paraCount <= 1 && (flags & DirPropFlagMultiRuns) == 0) { resolveImplicitLevels(0, length, GetLRFromLevel(GetParaLevelAt(0)), GetLRFromLevel(GetParaLevelAt(length - 1))); } else { /* sor, eor: start and end types of same-level-run */ int start, limit = 0; byte level, nextLevel; short sor, eor; /* determine the first sor and set eor to it because of the loop body (sor=eor there) */ level = GetParaLevelAt(0); nextLevel = levels[0]; if (level < nextLevel) { eor = GetLRFromLevel(nextLevel); } else { eor = GetLRFromLevel(level); } do { /* determine start and limit of the run (end points just behind the run) */ /* the values for this run's start are the same as for the previous run's end */ start = limit; level = nextLevel; if ((start > 0) && (dirProps[start - 1] == B)) { /* except if this is a new paragraph, then set sor = para level */ sor = GetLRFromLevel(GetParaLevelAt(start)); } else { sor = eor; } /* search for the limit of this run */ while ((++limit < length) && ((levels[limit] == level) || ((DirPropFlag(dirProps[limit]) & MASK_BN_EXPLICIT) != 0))) {} /* get the correct level of the next run */ if (limit < length) { nextLevel = levels[limit]; } else { nextLevel = GetParaLevelAt(length - 1); } /* determine eor from max(level, nextLevel); sor is last run's eor */ if (NoOverride(level) < NoOverride(nextLevel)) { eor = GetLRFromLevel(nextLevel); } else { eor = GetLRFromLevel(level); } /* if the run consists of overridden directional types, then there are no implicit types to be resolved */ if ((level & LEVEL_OVERRIDE) == 0) { resolveImplicitLevels(start, limit, sor, eor); } else { /* remove the LEVEL_OVERRIDE flags */ do { levels[start++] &= ~LEVEL_OVERRIDE; } while (start < limit); } } while (limit < length); } /* reset the embedding levels for some non-graphic characters (L1), (X9) */ adjustWSLevels(); break; } /* add RLM for inverse Bidi with contextual orientation resolving * to RTL which would not round-trip otherwise */ if ((defaultParaLevel > 0) && ((reorderingOptions & OPTION_INSERT_MARKS) != 0) && ((reorderingMode == REORDER_INVERSE_LIKE_DIRECT) || (reorderingMode == REORDER_INVERSE_FOR_NUMBERS_SPECIAL))) { int start, last; byte level; byte dirProp; for (int i = 0; i < paraCount; i++) { last = paras_limit[i] - 1; level = paras_level[i]; if (level == 0) continue; /* LTR paragraph */ start = i == 0 ? 0 : paras_limit[i - 1]; for (int j = last; j >= start; j--) { dirProp = dirProps[j]; if (dirProp == L) { if (j < last) { while (dirProps[last] == B) { last--; } } addPoint(last, RLM_BEFORE); break; } if ((DirPropFlag(dirProp) & MASK_R_AL) != 0) { break; } } } } if ((reorderingOptions & OPTION_REMOVE_CONTROLS) != 0) { resultLength -= controlCount; } else { resultLength += insertPoints.size; } setParaSuccess(); } /** * Perform the Unicode Bidi algorithm on a given paragraph, as defined in the * Unicode Standard Annex #9, * version 13, * also described in The Unicode Standard, Version 4.0 .

* * This method takes a paragraph of text and computes the * left-right-directionality of each character. The text should not * contain any Unicode block separators.

* * The RUN_DIRECTION attribute in the text, if present, determines the base * direction (left-to-right or right-to-left). If not present, the base * direction is computed using the Unicode Bidirectional Algorithm, * defaulting to left-to-right if there are no strong directional characters * in the text. This attribute, if present, must be applied to all the text * in the paragraph.

* * The BIDI_EMBEDDING attribute in the text, if present, represents * embedding level information. * Negative values indicate overrides at the absolute value of the level. * Positive values indicate embeddings. (See {@link #MAX_EXPLICIT_LEVEL}.) * Where values are zero or not defined, the base * embedding level as determined by the base direction is assumed.

* * The NUMERIC_SHAPING attribute in the text, if present, converts European * digits to other decimal digits before running the bidi algorithm. This * attribute, if present, must be applied to all the text in the paragraph. * * If the entire text is all of the same directionality, then * the method may not perform all the steps described by the algorithm, * i.e., some levels may not be the same as if all steps were performed. * This is not relevant for unidirectional text.
* For example, in pure LTR text with numbers the numbers would get * a resolved level of 2 higher than the surrounding text according to * the algorithm. This implementation may set all resolved levels to * the same value in such a case.

* * @param paragraph a paragraph of text with optional character and * paragraph attribute information * @stable ICU 3.8 */ public void setPara(AttributedCharacterIterator paragraph) { byte paraLvl; Boolean runDirection = (Boolean) paragraph.getAttribute(TextAttribute.RUN_DIRECTION); if (runDirection == null) { paraLvl = LEVEL_DEFAULT_LTR; } else { paraLvl = (runDirection.equals(TextAttribute.RUN_DIRECTION_LTR)) ? LTR : RTL; } byte[] lvls = null; int len = paragraph.getEndIndex() - paragraph.getBeginIndex(); byte[] embeddingLevels = new byte[len]; char[] txt = new char[len]; int i = 0; char ch = paragraph.first(); while (ch != AttributedCharacterIterator.DONE) { txt[i] = ch; Integer embedding = (Integer) paragraph.getAttribute(TextAttribute.BIDI_EMBEDDING); if (embedding != null) { byte level = embedding.byteValue(); if (level == 0) { /* no-op */ } else if (level < 0) { lvls = embeddingLevels; embeddingLevels[i] = (byte)((0 - level) | LEVEL_OVERRIDE); } else { lvls = embeddingLevels; embeddingLevels[i] = level; } } ch = paragraph.next(); ++i; } NumericShaper shaper = (NumericShaper) paragraph.getAttribute(TextAttribute.NUMERIC_SHAPING); if (shaper != null) { shaper.shape(txt, 0, len); } setPara(txt, paraLvl, lvls); } /** * Specify whether block separators must be allocated level zero, * so that successive paragraphs will progress from left to right. * This method must be called before setPara(). * Paragraph separators (B) may appear in the text. Setting them to level zero * means that all paragraph separators (including one possibly appearing * in the last text position) are kept in the reordered text after the text * that they follow in the source text. * When this feature is not enabled, a paragraph separator at the last * position of the text before reordering will go to the first position * of the reordered text when the paragraph level is odd. * * @param ordarParaLTR specifies whether paragraph separators (B) must * receive level 0, so that successive paragraphs progress from left to right. * * @see #setPara * @stable ICU 3.8 */ public void orderParagraphsLTR(boolean ordarParaLTR) { orderParagraphsLTR = ordarParaLTR; } /** * Is this Bidi object set to allocate level 0 to block * separators so that successive paragraphs progress from left to right? * * @return true if the Bidi object is set to * allocate level 0 to block separators. * * @stable ICU 3.8 */ public boolean isOrderParagraphsLTR() { return orderParagraphsLTR; } /** * Get the directionality of the text. * * @return a value of LTR, RTL or MIXED * that indicates if the entire text * represented by this object is unidirectional, * and which direction, or if it is mixed-directional. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #LTR * @see #RTL * @see #MIXED * @stable ICU 3.8 */ public byte getDirection() { verifyValidParaOrLine(); return direction; } /** * Get the text. * * @return A String containing the text that the * Bidi object was created for. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #setPara * @see #setLine * @stable ICU 3.8 */ public String getTextAsString() { verifyValidParaOrLine(); return new String(text); } /** * Get the text. * * @return A char array containing the text that the * Bidi object was created for. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #setPara * @see #setLine * @stable ICU 3.8 */ public char[] getText() { verifyValidParaOrLine(); return text; } /** * Get the length of the text. * * @return The length of the text that the Bidi object was * created for. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @stable ICU 3.8 */ public int getLength() { verifyValidParaOrLine(); return originalLength; } /** * Get the length of the source text processed by the last call to * setPara(). This length may be different from the length of * the source text if option OPTION_STREAMING has been * set. *
* Note that whenever the length of the text affects the execution or the * result of a method, it is the processed length which must be considered, * except for setPara (which receives unprocessed source text) * and getLength (which returns the original length of the * source text).
* In particular, the processed length is the one to consider in the * following cases: *

    *
  • maximum value of the limit argument of * setLine
  • *
  • maximum value of the charIndex argument of * getParagraph
  • *
  • maximum value of the charIndex argument of * getLevelAt
  • *
  • number of elements in the array returned by getLevels *
  • *
  • maximum value of the logicalStart argument of * getLogicalRun
  • *
  • maximum value of the logicalIndex argument of * getVisualIndex
  • *
  • number of elements returned by getLogicalMap
  • *
  • length of text processed by writeReordered
  • *
* * @return The length of the part of the source text processed by * the last call to setPara. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #setPara * @see #OPTION_STREAMING * @stable ICU 3.8 */ public int getProcessedLength() { verifyValidParaOrLine(); return length; } /** * Get the length of the reordered text resulting from the last call to * setPara(). This length may be different from the length * of the source text if option OPTION_INSERT_MARKS * or option OPTION_REMOVE_CONTROLS has been set. *
* This resulting length is the one to consider in the following cases: *
    *
  • maximum value of the visualIndex argument of * getLogicalIndex
  • *
  • number of elements returned by getVisualMap
  • *
* Note that this length stays identical to the source text length if * Bidi marks are inserted or removed using option bits of * writeReordered, or if option * REORDER_INVERSE_NUMBERS_AS_L has been set. * * @return The length of the reordered text resulting from * the last call to setPara. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #setPara * @see #OPTION_INSERT_MARKS * @see #OPTION_REMOVE_CONTROLS * @see #REORDER_INVERSE_NUMBERS_AS_L * @stable ICU 3.8 */ public int getResultLength() { verifyValidParaOrLine(); return resultLength; } /* paragraphs API methods ------------------------------------------------- */ /** * Get the paragraph level of the text. * * @return The paragraph level. If there are multiple paragraphs, their * level may vary if the required paraLevel is LEVEL_DEFAULT_LTR or * LEVEL_DEFAULT_RTL. In that case, the level of the first paragraph * is returned. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #LEVEL_DEFAULT_LTR * @see #LEVEL_DEFAULT_RTL * @see #getParagraph * @see #getParagraphByIndex * @stable ICU 3.8 */ public byte getParaLevel() { verifyValidParaOrLine(); return paraLevel; } /** * Get the number of paragraphs. * * @return The number of paragraphs. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @stable ICU 3.8 */ public int countParagraphs() { verifyValidParaOrLine(); return paraCount; } /** * Get a paragraph, given the index of this paragraph. * * This method returns information about a paragraph.

* * @param paraIndex is the number of the paragraph, in the * range [0..countParagraphs()-1]. * * @return a BidiRun object with the details of the paragraph:
* start will receive the index of the first character * of the paragraph in the text.
* limit will receive the limit of the paragraph.
* embeddingLevel will receive the level of the paragraph. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if paraIndex is not in the range * [0..countParagraphs()-1] * * @see com.ibm.icu.text.BidiRun * @stable ICU 3.8 */ public BidiRun getParagraphByIndex(int paraIndex) { verifyValidParaOrLine(); verifyRange(paraIndex, 0, paraCount); Bidi bidi = paraBidi; /* get Para object if Line object */ int paraStart; if (paraIndex == 0) { paraStart = 0; } else { paraStart = bidi.paras_limit[paraIndex - 1]; } BidiRun bidiRun = new BidiRun(); bidiRun.start = paraStart; bidiRun.limit = bidi.paras_limit[paraIndex]; bidiRun.level = GetParaLevelAt(paraStart); return bidiRun; } /** * Get a paragraph, given a position within the text. * This method returns information about a paragraph.
* Note: if the paragraph index is known, it is more efficient to * retrieve the paragraph information using getParagraphByIndex().

* * @param charIndex is the index of a character within the text, in the * range [0..getProcessedLength()-1]. * * @return a BidiRun object with the details of the paragraph:
* start will receive the index of the first character * of the paragraph in the text.
* limit will receive the limit of the paragraph.
* embeddingLevel will receive the level of the paragraph. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if charIndex is not within the legal range * * @see com.ibm.icu.text.BidiRun * @see #getParagraphByIndex * @see #getProcessedLength * @stable ICU 3.8 */ public BidiRun getParagraph(int charIndex) { verifyValidParaOrLine(); Bidi bidi = paraBidi; /* get Para object if Line object */ verifyRange(charIndex, 0, bidi.length); int paraIndex; for (paraIndex = 0; charIndex >= bidi.paras_limit[paraIndex]; paraIndex++) { } return getParagraphByIndex(paraIndex); } /** * Get the index of a paragraph, given a position within the text.

* * @param charIndex is the index of a character within the text, in the * range [0..getProcessedLength()-1]. * * @return The index of the paragraph containing the specified position, * starting from 0. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if charIndex is not within the legal range * * @see com.ibm.icu.text.BidiRun * @see #getProcessedLength * @stable ICU 3.8 */ public int getParagraphIndex(int charIndex) { verifyValidParaOrLine(); Bidi bidi = paraBidi; /* get Para object if Line object */ verifyRange(charIndex, 0, bidi.length); int paraIndex; for (paraIndex = 0; charIndex >= bidi.paras_limit[paraIndex]; paraIndex++) { } return paraIndex; } /** * Set a custom Bidi classifier used by the UBA implementation for Bidi * class determination. * * @param classifier A new custom classifier. This can be null. * * @see #getCustomClassifier * @stable ICU 3.8 */ public void setCustomClassifier(BidiClassifier classifier) { this.customClassifier = classifier; } /** * Gets the current custom class classifier used for Bidi class * determination. * * @return An instance of class BidiClassifier * * @see #setCustomClassifier * @stable ICU 3.8 */ public BidiClassifier getCustomClassifier() { return this.customClassifier; } /** * Retrieves the Bidi class for a given code point. *

If a BidiClassifier is defined and returns a value * other than UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)+1, * that value is used; otherwise the default class determination mechanism is invoked. * * @param c The code point to get a Bidi class for. * * @return The Bidi class for the character c that is in effect * for this Bidi instance. * * @see BidiClassifier * @stable ICU 3.8 */ public int getCustomizedClass(int c) { int dir; if (customClassifier == null || (dir = customClassifier.classify(c)) == Bidi.CLASS_DEFAULT) { dir = bdp.getClass(c); } if (dir >= UCharacterDirection.CHAR_DIRECTION_COUNT) dir = ON; return dir; } /** * setLine() returns a Bidi object to * contain the reordering information, especially the resolved levels, * for all the characters in a line of text. This line of text is * specified by referring to a Bidi object representing * this information for a piece of text containing one or more paragraphs, * and by specifying a range of indexes in this text.

* In the new line object, the indexes will range from 0 to limit-start-1.

* * This is used after calling setPara() * for a piece of text, and after line-breaking on that text. * It is not necessary if each paragraph is treated as a single line.

* * After line-breaking, rules (L1) and (L2) for the treatment of * trailing WS and for reordering are performed on * a Bidi object that represents a line.

* * Important: the line Bidi object may * reference data within the global text Bidi object. * You should not alter the content of the global text object until * you are finished using the line object. * * @param start is the line's first index into the text. * * @param limit is just behind the line's last index into the text * (its last index +1). * * @return a Bidi object that will now represent a line of the text. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara * @throws IllegalArgumentException if start and limit are not in the range * 0<=start<limit<=getProcessedLength(), * or if the specified line crosses a paragraph boundary * * @see #setPara * @see #getProcessedLength * @stable ICU 3.8 */ public Bidi setLine(int start, int limit) { verifyValidPara(); verifyRange(start, 0, limit); verifyRange(limit, 0, length+1); if (getParagraphIndex(start) != getParagraphIndex(limit - 1)) { /* the line crosses a paragraph boundary */ throw new IllegalArgumentException(); } return BidiLine.setLine(this, start, limit); } /** * Get the level for one character. * * @param charIndex the index of a character. * * @return The level for the character at charIndex. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if charIndex is not in the range * 0<=charIndex<getProcessedLength() * * @see #getProcessedLength * @stable ICU 3.8 */ public byte getLevelAt(int charIndex) { verifyValidParaOrLine(); verifyRange(charIndex, 0, length); return BidiLine.getLevelAt(this, charIndex); } /** * Get an array of levels for each character.

* * Note that this method may allocate memory under some * circumstances, unlike getLevelAt(). * * @return The levels array for the text, * or null if an error occurs. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @stable ICU 3.8 */ public byte[] getLevels() { verifyValidParaOrLine(); if (length <= 0) { return new byte[0]; } return BidiLine.getLevels(this); } /** * Get a logical run. * This method returns information about a run and is used * to retrieve runs in logical order.

* This is especially useful for line-breaking on a paragraph. * * @param logicalPosition is a logical position within the source text. * * @return a BidiRun object filled with start containing * the first character of the run, limit containing * the limit of the run, and embeddingLevel containing * the level of the run. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if logicalPosition is not in the range * 0<=logicalPosition<getProcessedLength() * * @see com.ibm.icu.text.BidiRun * @see com.ibm.icu.text.BidiRun#getStart() * @see com.ibm.icu.text.BidiRun#getLimit() * @see com.ibm.icu.text.BidiRun#getEmbeddingLevel() * * @stable ICU 3.8 */ public BidiRun getLogicalRun(int logicalPosition) { verifyValidParaOrLine(); verifyRange(logicalPosition, 0, length); return BidiLine.getLogicalRun(this, logicalPosition); } /** * Get the number of runs. * This method may invoke the actual reordering on the * Bidi object, after setPara() * may have resolved only the levels of the text. Therefore, * countRuns() may have to allocate memory, * and may throw an exception if it fails to do so. * * @return The number of runs. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @stable ICU 3.8 */ public int countRuns() { verifyValidParaOrLine(); BidiLine.getRuns(this); return runCount; } /** * * Get a BidiRun object according to its index. BidiRun methods * may be used to retrieve the run's logical start, length and level, * which can be even for an LTR run or odd for an RTL run. * In an RTL run, the character at the logical start is * visually on the right of the displayed run. * The length is the number of characters in the run.

* countRuns() is normally called * before the runs are retrieved. * *

* Example: *

     *  Bidi bidi = new Bidi();
     *  String text = "abc 123 DEFG xyz";
     *  bidi.setPara(text, Bidi.RTL, null);
     *  int i, count=bidi.countRuns(), logicalStart, visualIndex=0, length;
     *  BidiRun run;
     *  for (i = 0; i < count; ++i) {
     *      run = bidi.getVisualRun(i);
     *      logicalStart = run.getStart();
     *      length = run.getLength();
     *      if (Bidi.LTR == run.getEmbeddingLevel()) {
     *          do { // LTR
     *              show_char(text.charAt(logicalStart++), visualIndex++);
     *          } while (--length > 0);
     *      } else {
     *          logicalStart += length;  // logicalLimit
     *          do { // RTL
     *              show_char(text.charAt(--logicalStart), visualIndex++);
     *          } while (--length > 0);
     *      }
     *  }
     * 
*

* Note that in right-to-left runs, code like this places * second surrogates before first ones (which is generally a bad idea) * and combining characters before base characters. *

* Use of {@link #writeReordered}, optionally with the * {@link #KEEP_BASE_COMBINING} option, can be considered in * order to avoid these issues. * * @param runIndex is the number of the run in visual order, in the * range [0..countRuns()-1]. * * @return a BidiRun object containing the details of the run. The * directionality of the run is * LTR==0 or RTL==1, * never MIXED. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if runIndex is not in * the range 0<=runIndex<countRuns() * * @see #countRuns() * @see com.ibm.icu.text.BidiRun * @see com.ibm.icu.text.BidiRun#getStart() * @see com.ibm.icu.text.BidiRun#getLength() * @see com.ibm.icu.text.BidiRun#getEmbeddingLevel() * @stable ICU 3.8 */ public BidiRun getVisualRun(int runIndex) { verifyValidParaOrLine(); BidiLine.getRuns(this); verifyRange(runIndex, 0, runCount); return BidiLine.getVisualRun(this, runIndex); } /** * Get the visual position from a logical text position. * If such a mapping is used many times on the same * Bidi object, then calling * getLogicalMap() is more efficient. *

* The value returned may be MAP_NOWHERE if there is no * visual position because the corresponding text character is a Bidi * control removed from output by the option * OPTION_REMOVE_CONTROLS. *

* When the visual output is altered by using options of * writeReordered() such as INSERT_LRM_FOR_NUMERIC, * KEEP_BASE_COMBINING, OUTPUT_REVERSE, * REMOVE_BIDI_CONTROLS, the visual position returned may not * be correct. It is advised to use, when possible, reordering options * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. *

* Note that in right-to-left runs, this mapping places * second surrogates before first ones (which is generally a bad idea) * and combining characters before base characters. * Use of {@link #writeReordered}, optionally with the * {@link #KEEP_BASE_COMBINING} option can be considered instead * of using the mapping, in order to avoid these issues. * * @param logicalIndex is the index of a character in the text. * * @return The visual position of this character. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if logicalIndex is not in * the range 0<=logicalIndex<getProcessedLength() * * @see #getLogicalMap * @see #getLogicalIndex * @see #getProcessedLength * @see #MAP_NOWHERE * @see #OPTION_REMOVE_CONTROLS * @see #writeReordered * @stable ICU 3.8 */ public int getVisualIndex(int logicalIndex) { verifyValidParaOrLine(); verifyRange(logicalIndex, 0, length); return BidiLine.getVisualIndex(this, logicalIndex); } /** * Get the logical text position from a visual position. * If such a mapping is used many times on the same * Bidi object, then calling * getVisualMap() is more efficient. *

* The value returned may be MAP_NOWHERE if there is no * logical position because the corresponding text character is a Bidi * mark inserted in the output by option * OPTION_INSERT_MARKS. *

* This is the inverse method to getVisualIndex(). *

* When the visual output is altered by using options of * writeReordered() such as INSERT_LRM_FOR_NUMERIC, * KEEP_BASE_COMBINING, OUTPUT_REVERSE, * REMOVE_BIDI_CONTROLS, the logical position returned may not * be correct. It is advised to use, when possible, reordering options * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. * * @param visualIndex is the visual position of a character. * * @return The index of this character in the text. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if visualIndex is not in * the range 0<=visualIndex<getResultLength() * * @see #getVisualMap * @see #getVisualIndex * @see #getResultLength * @see #MAP_NOWHERE * @see #OPTION_INSERT_MARKS * @see #writeReordered * @stable ICU 3.8 */ public int getLogicalIndex(int visualIndex) { verifyValidParaOrLine(); verifyRange(visualIndex, 0, resultLength); /* we can do the trivial cases without the runs array */ if (insertPoints.size == 0 && controlCount == 0) { if (direction == LTR) { return visualIndex; } else if (direction == RTL) { return length - visualIndex - 1; } } BidiLine.getRuns(this); return BidiLine.getLogicalIndex(this, visualIndex); } /** * Get a logical-to-visual index map (array) for the characters in the * Bidi (paragraph or line) object. *

* Some values in the map may be MAP_NOWHERE if the * corresponding text characters are Bidi controls removed from the visual * output by the option OPTION_REMOVE_CONTROLS. *

* When the visual output is altered by using options of * writeReordered() such as INSERT_LRM_FOR_NUMERIC, * KEEP_BASE_COMBINING, OUTPUT_REVERSE, * REMOVE_BIDI_CONTROLS, the visual positions returned may not * be correct. It is advised to use, when possible, reordering options * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. *

* Note that in right-to-left runs, this mapping places * second surrogates before first ones (which is generally a bad idea) * and combining characters before base characters. * Use of {@link #writeReordered}, optionally with the * {@link #KEEP_BASE_COMBINING} option can be considered instead * of using the mapping, in order to avoid these issues. * * @return an array of getProcessedLength() * indexes which will reflect the reordering of the characters.

* The index map will result in * indexMap[logicalIndex]==visualIndex, where * indexMap represents the returned array. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #getVisualMap * @see #getVisualIndex * @see #getProcessedLength * @see #MAP_NOWHERE * @see #OPTION_REMOVE_CONTROLS * @see #writeReordered * @stable ICU 3.8 */ public int[] getLogicalMap() { /* countRuns() checks successful call to setPara/setLine */ countRuns(); if (length <= 0) { return new int[0]; } return BidiLine.getLogicalMap(this); } /** * Get a visual-to-logical index map (array) for the characters in the * Bidi (paragraph or line) object. *

* Some values in the map may be MAP_NOWHERE if the * corresponding text characters are Bidi marks inserted in the visual * output by the option OPTION_INSERT_MARKS. *

* When the visual output is altered by using options of * writeReordered() such as INSERT_LRM_FOR_NUMERIC, * KEEP_BASE_COMBINING, OUTPUT_REVERSE, * REMOVE_BIDI_CONTROLS, the logical positions returned may not * be correct. It is advised to use, when possible, reordering options * such as {@link #OPTION_INSERT_MARKS} and {@link #OPTION_REMOVE_CONTROLS}. * * @return an array of getResultLength() * indexes which will reflect the reordering of the characters.

* The index map will result in * indexMap[visualIndex]==logicalIndex, where * indexMap represents the returned array. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #getLogicalMap * @see #getLogicalIndex * @see #getResultLength * @see #MAP_NOWHERE * @see #OPTION_INSERT_MARKS * @see #writeReordered * @stable ICU 3.8 */ public int[] getVisualMap() { /* countRuns() checks successful call to setPara/setLine */ countRuns(); if (resultLength <= 0) { return new int[0]; } return BidiLine.getVisualMap(this); } /** * This is a convenience method that does not use a Bidi object. * It is intended to be used for when an application has determined the levels * of objects (character sequences) and just needs to have them reordered (L2). * This is equivalent to using getLogicalMap() on a * Bidi object. * * @param levels is an array of levels that have been determined by * the application. * * @return an array of levels.length * indexes which will reflect the reordering of the characters.

* The index map will result in * indexMap[logicalIndex]==visualIndex, where * indexMap represents the returned array. * * @stable ICU 3.8 */ public static int[] reorderLogical(byte[] levels) { return BidiLine.reorderLogical(levels); } /** * This is a convenience method that does not use a Bidi object. * It is intended to be used for when an application has determined the levels * of objects (character sequences) and just needs to have them reordered (L2). * This is equivalent to using getVisualMap() on a * Bidi object. * * @param levels is an array of levels that have been determined by * the application. * * @return an array of levels.length * indexes which will reflect the reordering of the characters.

* The index map will result in * indexMap[visualIndex]==logicalIndex, where * indexMap represents the returned array. * * @stable ICU 3.8 */ public static int[] reorderVisual(byte[] levels) { return BidiLine.reorderVisual(levels); } /** * Invert an index map. * The index mapping of the argument map is inverted and returned as * an array of indexes that we will call the inverse map. * * @param srcMap is an array whose elements define the original mapping * from a source array to a destination array. * Some elements of the source array may have no mapping in the * destination array. In that case, their value will be * the special value MAP_NOWHERE. * All elements must be >=0 or equal to MAP_NOWHERE. * Some elements in the source map may have a value greater than the * srcMap.length if the destination array has more elements than the * source array. * There must be no duplicate indexes (two or more elements with the * same value except MAP_NOWHERE). * * @return an array representing the inverse map. * This array has a number of elements equal to 1 + the highest * value in srcMap. * For elements of the result array which have no matching elements * in the source array, the corresponding elements in the inverse * map will receive a value equal to MAP_NOWHERE. * If element with index i in srcMap has a value k different * from MAP_NOWHERE, this means that element i of * the source array maps to element k in the destination array. * The inverse map will have value i in its k-th element. * For all elements of the destination array which do not map to * an element in the source array, the corresponding element in the * inverse map will have a value equal to MAP_NOWHERE. * * @see #MAP_NOWHERE * @stable ICU 3.8 */ public static int[] invertMap(int[] srcMap) { if (srcMap == null) { return null; } else { return BidiLine.invertMap(srcMap); } } /* * Fields and methods for compatibility with java.text.bidi (Sun implementation) */ /** * Constant indicating base direction is left-to-right. * @stable ICU 3.8 */ public static final int DIRECTION_LEFT_TO_RIGHT = LTR; /** * Constant indicating base direction is right-to-left. * @stable ICU 3.8 */ public static final int DIRECTION_RIGHT_TO_LEFT = RTL; /** * Constant indicating that the base direction depends on the first strong * directional character in the text according to the Unicode Bidirectional * Algorithm. If no strong directional character is present, the base * direction is left-to-right. * @stable ICU 3.8 */ public static final int DIRECTION_DEFAULT_LEFT_TO_RIGHT = LEVEL_DEFAULT_LTR; /** * Constant indicating that the base direction depends on the first strong * directional character in the text according to the Unicode Bidirectional * Algorithm. If no strong directional character is present, the base * direction is right-to-left. * @stable ICU 3.8 */ public static final int DIRECTION_DEFAULT_RIGHT_TO_LEFT = LEVEL_DEFAULT_RTL; /** * Create Bidi from the given paragraph of text and base direction. * * @param paragraph a paragraph of text * @param flags a collection of flags that control the algorithm. The * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT, * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved. * @see #DIRECTION_LEFT_TO_RIGHT * @see #DIRECTION_RIGHT_TO_LEFT * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT * @stable ICU 3.8 */ public Bidi(String paragraph, int flags) { this(paragraph.toCharArray(), 0, null, 0, paragraph.length(), flags); } /** * Create Bidi from the given paragraph of text.

* * The RUN_DIRECTION attribute in the text, if present, determines the base * direction (left-to-right or right-to-left). If not present, the base * direction is computed using the Unicode Bidirectional Algorithm, * defaulting to left-to-right if there are no strong directional characters * in the text. This attribute, if present, must be applied to all the text * in the paragraph.

* * The BIDI_EMBEDDING attribute in the text, if present, represents * embedding level information. * Negative values indicate overrides at the absolute value of the level. * Positive values indicate embeddings. (See {@link #MAX_EXPLICIT_LEVEL}.) * Where values are zero or not defined, the base * embedding level as determined by the base direction is assumed.

* * The NUMERIC_SHAPING attribute in the text, if present, converts European * digits to other decimal digits before running the bidi algorithm. This * attribute, if present, must be applied to all the text in the paragraph.

* * Note: this constructor calls setPara() internally. * * @param paragraph a paragraph of text with optional character and * paragraph attribute information * @stable ICU 3.8 */ public Bidi(AttributedCharacterIterator paragraph) { this(); setPara(paragraph); } /** * Create Bidi from the given text, embedding, and direction information. * *

The embeddings array may be null. If present, the values represent * embedding level information. * Negative values indicate overrides at the absolute value of the level. * Positive values indicate embeddings. (See {@link #MAX_EXPLICIT_LEVEL}.) * Where values are zero, the base embedding level * as determined by the base direction is assumed, * except for paragraph separators which remain at 0 to prevent reordering of paragraphs.

* *

Note: This constructor calls setPara() internally, * after converting the java.text.Bidi-style embeddings with negative overrides * into ICU-style embeddings with bit fields for {@link #LEVEL_OVERRIDE} and the level. * * @param text an array containing the paragraph of text to process. * @param textStart the index into the text array of the start of the * paragraph. * @param embeddings an array containing embedding values for each character * in the paragraph. This can be null, in which case it is assumed * that there is no external embedding information. * @param embStart the index into the embedding array of the start of the * paragraph. * @param paragraphLength the length of the paragraph in the text and * embeddings arrays. * @param flags a collection of flags that control the algorithm. The * algorithm understands the flags DIRECTION_LEFT_TO_RIGHT, * DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and * DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved. * * @throws IllegalArgumentException if the values in embeddings are * not within the allowed range * * @see #DIRECTION_LEFT_TO_RIGHT * @see #DIRECTION_RIGHT_TO_LEFT * @see #DIRECTION_DEFAULT_LEFT_TO_RIGHT * @see #DIRECTION_DEFAULT_RIGHT_TO_LEFT * @stable ICU 3.8 */ public Bidi(char[] text, int textStart, byte[] embeddings, int embStart, int paragraphLength, int flags) { this(); byte paraLvl; switch (flags) { case DIRECTION_LEFT_TO_RIGHT: default: paraLvl = LTR; break; case DIRECTION_RIGHT_TO_LEFT: paraLvl = RTL; break; case DIRECTION_DEFAULT_LEFT_TO_RIGHT: paraLvl = LEVEL_DEFAULT_LTR; break; case DIRECTION_DEFAULT_RIGHT_TO_LEFT: paraLvl = LEVEL_DEFAULT_RTL; break; } byte[] paraEmbeddings; if (embeddings == null) { paraEmbeddings = null; } else { // Convert from java.text.Bidi embeddings to ICU setPara() levels: // Copy to the start of a new array and convert java.text negative overrides // to ICU bit-field-and-mask overrides. // A copy of the embeddings is always required because // setPara() may modify its embeddings. paraEmbeddings = new byte[paragraphLength]; byte lev; for (int i = 0; i < paragraphLength; i++) { lev = embeddings[i + embStart]; if (lev < 0) { lev = (byte)((- lev) | LEVEL_OVERRIDE); } // setPara() lifts level 0 up to the resolved paragraph level. paraEmbeddings[i] = lev; } } if (textStart == 0 && paragraphLength == text.length) { setPara(text, paraLvl, paraEmbeddings); } else { char[] paraText = new char[paragraphLength]; System.arraycopy(text, textStart, paraText, 0, paragraphLength); setPara(paraText, paraLvl, paraEmbeddings); } } /** * Create a Bidi object representing the bidi information on a line of text * within the paragraph represented by the current Bidi. This call is not * required if the entire paragraph fits on one line. * * @param lineStart the offset from the start of the paragraph to the start * of the line. * @param lineLimit the offset from the start of the paragraph to the limit * of the line. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara * @throws IllegalArgumentException if lineStart and lineLimit are not in the range * 0<=lineStart<lineLimit<=getProcessedLength(), * or if the specified line crosses a paragraph boundary * @stable ICU 3.8 */ public Bidi createLineBidi(int lineStart, int lineLimit) { return setLine(lineStart, lineLimit); } /** * Return true if the line is not left-to-right or right-to-left. This means * it either has mixed runs of left-to-right and right-to-left text, or the * base direction differs from the direction of the only run of text. * * @return true if the line is not left-to-right or right-to-left. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara * @stable ICU 3.8 */ public boolean isMixed() { return (!isLeftToRight() && !isRightToLeft()); } /** * Return true if the line is all left-to-right text and the base direction * is left-to-right. * * @return true if the line is all left-to-right text and the base direction * is left-to-right. * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara * @stable ICU 3.8 */ public boolean isLeftToRight() { return (getDirection() == LTR && (paraLevel & 1) == 0); } /** * Return true if the line is all right-to-left text, and the base direction * is right-to-left * * @return true if the line is all right-to-left text, and the base * direction is right-to-left * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara * @stable ICU 3.8 */ public boolean isRightToLeft() { return (getDirection() == RTL && (paraLevel & 1) == 1); } /** * Return true if the base direction is left-to-right * * @return true if the base direction is left-to-right * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @stable ICU 3.8 */ public boolean baseIsLeftToRight() { return (getParaLevel() == LTR); } /** * Return the base level (0 if left-to-right, 1 if right-to-left). * * @return the base level * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @stable ICU 3.8 */ public int getBaseLevel() { return getParaLevel(); } /** * Return the number of level runs. * * @return the number of level runs * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @stable ICU 3.8 */ public int getRunCount() { return countRuns(); } /** * Compute the logical to visual run mapping */ void getLogicalToVisualRunsMap() { if (isGoodLogicalToVisualRunsMap) { return; } int count = countRuns(); if ((logicalToVisualRunsMap == null) || (logicalToVisualRunsMap.length < count)) { logicalToVisualRunsMap = new int[count]; } int i; long[] keys = new long[count]; for (i = 0; i < count; i++) { keys[i] = ((long)(runs[i].start)<<32) + i; } Arrays.sort(keys); for (i = 0; i < count; i++) { logicalToVisualRunsMap[i] = (int)(keys[i] & 0x00000000FFFFFFFF); } isGoodLogicalToVisualRunsMap = true; } /** * Return the level of the nth logical run in this line. * * @param run the index of the run, between 0 and countRuns()-1 * * @return the level of the run * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if run is not in * the range 0<=run<countRuns() * @stable ICU 3.8 */ public int getRunLevel(int run) { verifyValidParaOrLine(); BidiLine.getRuns(this); verifyRange(run, 0, runCount); getLogicalToVisualRunsMap(); return runs[logicalToVisualRunsMap[run]].level; } /** * Return the index of the character at the start of the nth logical run in * this line, as an offset from the start of the line. * * @param run the index of the run, between 0 and countRuns() * * @return the start of the run * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if run is not in * the range 0<=run<countRuns() * @stable ICU 3.8 */ public int getRunStart(int run) { verifyValidParaOrLine(); BidiLine.getRuns(this); verifyRange(run, 0, runCount); getLogicalToVisualRunsMap(); return runs[logicalToVisualRunsMap[run]].start; } /** * Return the index of the character past the end of the nth logical run in * this line, as an offset from the start of the line. For example, this * will return the length of the line for the last run on the line. * * @param run the index of the run, between 0 and countRuns() * * @return the limit of the run * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * @throws IllegalArgumentException if run is not in * the range 0<=run<countRuns() * @stable ICU 3.8 */ public int getRunLimit(int run) { verifyValidParaOrLine(); BidiLine.getRuns(this); verifyRange(run, 0, runCount); getLogicalToVisualRunsMap(); int idx = logicalToVisualRunsMap[run]; int len = idx == 0 ? runs[idx].limit : runs[idx].limit - runs[idx-1].limit; return runs[idx].start + len; } /** * Return true if the specified text requires bidi analysis. If this returns * false, the text will display left-to-right. Clients can then avoid * constructing a Bidi object. Text in the Arabic Presentation Forms area of * Unicode is presumed to already be shaped and ordered for display, and so * will not cause this method to return true. * * @param text the text containing the characters to test * @param start the start of the range of characters to test * @param limit the limit of the range of characters to test * * @return true if the range of characters requires bidi analysis * * @stable ICU 3.8 */ public static boolean requiresBidi(char[] text, int start, int limit) { final int RTLMask = (1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT | 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC | 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING | 1 << UCharacter.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE | 1 << UCharacter.DIRECTIONALITY_ARABIC_NUMBER); for (int i = start; i < limit; ++i) { if (((1 << UCharacter.getDirection(text[i])) & RTLMask) != 0) { return true; } } return false; } /** * Reorder the objects in the array into visual order based on their levels. * This is a utility method to use when you have a collection of objects * representing runs of text in logical order, each run containing text at a * single level. The elements at index from * objectStart up to objectStart + count in the * objects array will be reordered into visual order assuming * each run of text has the level indicated by the corresponding element in * the levels array (at index - objectStart + levelStart). * * @param levels an array representing the bidi level of each object * @param levelStart the start position in the levels array * @param objects the array of objects to be reordered into visual order * @param objectStart the start position in the objects array * @param count the number of objects to reorder * @stable ICU 3.8 */ public static void reorderVisually(byte[] levels, int levelStart, Object[] objects, int objectStart, int count) { byte[] reorderLevels = new byte[count]; System.arraycopy(levels, levelStart, reorderLevels, 0, count); int[] indexMap = reorderVisual(reorderLevels); Object[] temp = new Object[count]; System.arraycopy(objects, objectStart, temp, 0, count); for (int i = 0; i < count; ++i) { objects[objectStart + i] = temp[indexMap[i]]; } } /** * Take a Bidi object containing the reordering * information for a piece of text (one or more paragraphs) set by * setPara() or for a line of text set by setLine() * and return a string containing the reordered text. * *

The text may have been aliased (only a reference was stored * without copying the contents), thus it must not have been modified * since the setPara() call. * * This method preserves the integrity of characters with multiple * code units and (optionally) combining characters. * Characters in RTL runs can be replaced by mirror-image characters * in the returned string. Note that "real" mirroring has to be done in a * rendering engine by glyph selection and that for many "mirrored" * characters there are no Unicode characters as mirror-image equivalents. * There are also options to insert or remove Bidi control * characters; see the descriptions of the return value and the * options parameter, and of the option bit flags. * * @param options A bit set of options for the reordering that control * how the reordered text is written. * The options include mirroring the characters on a code * point basis and inserting LRM characters, which is used * especially for transforming visually stored text * to logically stored text (although this is still an * imperfect implementation of an "inverse Bidi" algorithm * because it uses the "forward Bidi" algorithm at its core). * The available options are: * DO_MIRRORING, * INSERT_LRM_FOR_NUMERIC, * KEEP_BASE_COMBINING, * OUTPUT_REVERSE, * REMOVE_BIDI_CONTROLS, * STREAMING * * @return The reordered text. * If the INSERT_LRM_FOR_NUMERIC option is set, then * the length of the returned string could be as large as * getLength()+2*countRuns().
* If the REMOVE_BIDI_CONTROLS option is set, then the * length of the returned string may be less than * getLength().
* If none of these options is set, then the length of the returned * string will be exactly getProcessedLength(). * * @throws IllegalStateException if this call is not preceded by a successful * call to setPara or setLine * * @see #DO_MIRRORING * @see #INSERT_LRM_FOR_NUMERIC * @see #KEEP_BASE_COMBINING * @see #OUTPUT_REVERSE * @see #REMOVE_BIDI_CONTROLS * @see #OPTION_STREAMING * @see #getProcessedLength * @stable ICU 3.8 */ public String writeReordered(int options) { verifyValidParaOrLine(); if (length == 0) { /* nothing to do */ return ""; } return BidiWriter.writeReordered(this, options); } /** * Reverse a Right-To-Left run of Unicode text. * * This method preserves the integrity of characters with multiple * code units and (optionally) combining characters. * Characters can be replaced by mirror-image characters * in the destination buffer. Note that "real" mirroring has * to be done in a rendering engine by glyph selection * and that for many "mirrored" characters there are no * Unicode characters as mirror-image equivalents. * There are also options to insert or remove Bidi control * characters. * * This method is the implementation for reversing RTL runs as part * of writeReordered(). For detailed descriptions * of the parameters, see there. * Since no Bidi controls are inserted here, the output string length * will never exceed src.length(). * * @see #writeReordered * * @param src The RTL run text. * * @param options A bit set of options for the reordering that control * how the reordered text is written. * See the options parameter in writeReordered(). * * @return The reordered text. * If the REMOVE_BIDI_CONTROLS option * is set, then the length of the returned string may be less than * src.length(). If this option is not set, * then the length of the returned string will be exactly * src.length(). * * @throws IllegalArgumentException if src is null. * @stable ICU 3.8 */ public static String writeReverse(String src, int options) { /* error checking */ if (src == null) { throw new IllegalArgumentException(); } if (src.length() > 0) { return BidiWriter.writeReverse(src, options); } else { /* nothing to do */ return ""; } } }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy