javacc-7.0.4.www.doc.CharStream.html Maven / Gradle / Ivy

Go to download




	 
 JavaCC: CharStream Classes MiniTutorial




JavaCC [tm]: CharStream Classes MiniTutorial

This document describes in some detail the methods of the CharStream
classes. Note that some of the details may not be relevant for the
CharStream interface (to be used with USER_CHAR_STREAM).


There are 4 different kinds of char stream classes that are generated
based on combinations of various options.



   ASCII_CharStream

       Generated when neither of the two options - UNICODE_INPUT
       or JAVA_UNICODE_ESCAPE is set.



       This class treats the input as a stream of 1-byte (ISO-LATIN1)
       characters. Note that this class can also be used to parse
       binary files. It just reads a byte and returns it as a 16 bit
       quantity to the lexical analyzer. So any character returned by
       this class will be in the range '\u0000'-'\u00ff'.


  
   ASCII_UCodeESC_CharStream

       Generated when the option JAVA_UNICODE_ESCAPE is set
       and the UNICODE_INPUT option is not set.



       This class treats the input as a stream of 1-byte characters.
       However, the special escape sequence


             ("\\\\")* "\\" ("u")+ - (odd number of backslahes followed
       by one or more 'u's.)


       is treated as a tag indicating that the next 4 bytes following
       the tag will be hexadecimal digits forming a 4-digit hex number
       whose value will be treated as the value of the character at the
       position indicated by the first backslash. Note that this value
       can be anything in the range 0x0-0xffff.


  
   UCode_CharStream

       Generated when the option UNICODE_INPUT is set and
       the option JAVA_UNICODE_ESCAPE is not set.



       This class treats the input as a stream of 2-byte characters. So
       it reads 2 bytes b1 and b2 and returns them as
       a single character using the expression  b1 << 8 | b2 
       assuming bigendian order. So in particular all the characters in
       the range 0x00-0xff are assumed to be stored as 2 bytes
       with the first (higher-order) byte being 0.


  
   UCode_UCodeESC_CharStream

       Generated when both the options UNICODE_INPUT and
       JAVA_UNICODE_ESCAPE are set.


       This class input is a stream of 2-byte characters (just
       like the UCode_CharStream class) and the special escape sequence


       ("\\\\")* "\\" ("u")+ - (odd number of backslahes followed
       by one or more 'u's.)


       is treated as a tag indicating that the next 4 2-byte characters
       following the tag will be hexadecimal digits forming a 4-digit hex
       number whose value will be treated as the value of the character at the
       position indicated by the first backslash. Note that this value
       can be any value in the range 0x0-0xffff. Also note that
       the backslash(es) and u(s) are all assumed to be given as 2-byte
       characters (with the higher order byte value being 0).


  



Note : None of the above classes can be used to read characters in a
       mixed mode, i.e., some characters given as 1-byte characters and others
       as 2-byte characters. To do this, you need to set USER_CHAR_STREAM
       option to true and define your own char stream.




(Throughout the following, we use the notation XXXCharStream that stands
 for any of the above described 4 classes.)

Constructors


  
  public XXXCharStream(java.io.InputStream dstream,
                                        int startline, int startcolumn)



  Takes an input stream, starting line and column numbers and constructs a
  CharStream object. It also creates buffers of initial size 4K for buffering the
  characters and also for line and column numbers for each of those characters.


  

   public XXXCharStream(java.io.InputStream dstream,
                                        int startline, int startcolumn, int buffersize)



  Takes an input stream, starting line and column numbers and constructs a
  CharStream object. It also creates buffers of initial size buffsize for buffering the
  characters and also for line and column numbers for each of those characters.


  So when you have an estimate on the maximum size of any token that can occur,
  you can use that size to optimize the buffer sizes. Note, however, that
  these sizes are only initial sizes and they will be expanded as and when
  needed (in 2K steps).

 


Methods

All the following methods will be static or nonstatic depending on
whether the STATIC option is true or false at the generation time. Also only
those methods that users can use in their lexical actions (using the
input_stream variable of the lexical analyzer) are documented
here. Rest of the (public) methods are very tightly coupled with the
implementation of the lexical analyzer and thus  should not  be
used in lexical actions. In the future when we adopt version 1.1 of the Java [tm] programming language, we will
streamline this by making that part of the interface an innerclass to
the lexical analyzer.



   public final char readChar() throws java.io.IOException


       This method returns the next "character" in the input according
       to the rules of the CharStream class as described above. It will
       throw java.io.IOException if it reaches EOF during the
       process of "constructing" the character. It also updates the line
       and column number and buffers the character for any possible
       backtracking that may be required later. It also stores the line
       and column numbers for the same purpose.

  
    public final int getBeginLine() 


       This method returns the line number for the beginning of the
       current match.

 
   public final int getBeginColumn() 


       This method returns the column number for the beginning of the
       current match.

  
    public final int getEndLine() 


       This method returns the line number for the ending of the
       current match.


  
    public final int getEndColumn() 


       This method returns the column number for the ending of the
       current match.

  
    public final void backup(int amount) 


       This method puts back amount number of characters
       into the stream. Note that the amount indicates the number
       of characters as constructed by readChar. Since the
       buffers used are circular buffers, it cannot check for
       illegal amount values, it just wraps around. So it
       is the user's responsibility to make sure that those many
       characters are really produced before a call to this method.

  
   public final String GetImage()


       Returns the image of the current match. As far as the XXXCharStream
       is concerned, all characters after the last call to the private method
       BeginToken are considered a part of the current match.

  
   public void ReInit(java.io.InputStream dstream,
                                        int startline, int startcolumn)



       This method reinitializes the XXXCharStream classes with a (possibly
       new) input stream and starting line and column numbers.

  
   public void ReInit(java.io.InputStream dstream,
                 int startline, int startcolumn, int buffersize)



       This method reinitializes the XXXCharStream classes with a (possibly
       new) input stream and starting line and column numbers and adjusts
       the size of the buffers to buffersize, by extending them.
       Note that if the value of buffersize is less than the current
       buffer sizes, they remain unchanged.

   
    public void adjustBeginLineColumn(int newLine, int newCol)


       This method adjusts the line and column number of the beginning of
       the current match to newLine and newCol and
       also adjusts the line and column numbers for all the characters
       in the lookahead buffer.
javacc-7.0.4.www.doc.CharStream.html Maven / Gradle / Ivy

JavaCC [tm]: CharStream Classes MiniTutorial

ASCII_CharStream

ASCII_UCodeESC_CharStream

UCode_CharStream

UCode_UCodeESC_CharStream

Note : None of the above classes can be used to read characters in a mixed mode, i.e., some characters given as 1-byte characters and others as 2-byte characters. To do this, you need to set USER_CHAR_STREAM option to true and define your own char stream.

Constructors

Methods