scriptella.driver.text.package.html Maven / Gradle / Ivy

Go to download


Text Driver for Scriptella.
It allows querying a text file based on regular expressions, the text driver
can also be used as a lightweight replacement for Velocity to produce
simple output with properties substitution.
Text driver does depends on additional libraries and is generally faster than CSV or Velocity driver.

Note: The driver doesn't use SQL syntax

General information

    
        Driver class:
        scriptella.driver.text.Driver
    
    
        URL:
        Text file URL. URIs are resolved relative to a script file directory.
            If url has no value the output is read from/printed to the console (System.out).
    
    
        Runtime dependencies:
        None
    

Driver Specific Properties

    
        Name
        Description
        Required
    
    
        encoding
        Specifies charset encoding of Text files.
        No, the system default encoding is used.
    
    
        eol
        End-Of-Line suffix.Only valid for <script> elements.
        No, the default value is \n.
    
    
        trim
        Value of true specifies that the leading and trailing
            whitespaces in text file lines should be omitted.
        No, the default value is true.
    
    
        flush
        Value of true specifies that the outputted content should flushed immediately when
            the <script> element completes.
        No, the default value is false.
    
    
        skip_lines
        The number of lines to skip before start reading.
        No, the default value is 0 (no lines are skipped).
    
    
        null_string
        Specifies string token to represent Java null literal.
            
                When querying a text file, regex group equal to null_string is returned as Java null.

                When outputting content, if null_string is specified, all the missing variables, or the vars with a null
                value
                are substituted with null_string.

            
Specify an empty string (null_string=) to automatically convert between nulls in memory and
                empty strings in files.
                For example: Query regex: \d*,\d*,\d*, input line 1,,5 is parsed into a set of
                3 variables with the following values {"1", null, "5"}
                as opposed to the default behaviour {"1","","5"}.
            
        
        No, by default strings are preserved, i.e. empty strings are not converted to nulls and null variables
            references are not expanded in the output, i.e. ${nullvalue}.
        
    

Query Syntax
Text driver supports Regular expressions syntax to query text files.
The file is read line-by-line from the location specified by the URL connection property and each line is matched
against the regex pattern.

    If a line or a part of it matches the pattern this match produces a virtual row in a result set.
    The column names in a virtual result set correspond to matched regex group names.
    For example query foo(.*) matches foobar line and the produced
    result set row contains two columns(groups): 0-foobar, 1-bar. These columns
    can be referenced in child script or query elements by a numeric name or by a string name columnN.


It also possible to specify more than one regular expressions to match file content.
    Specify each regular expression on a separate line to match them using OR condition.

The Text driver uses java.util.regex implementation for pattern matching. See java.util.Pattern
    for supported syntax Javadoc.

Additional notes:

    Regular expressions matching is case-insensitive
    Empty query selects all lines from the input file.
    The 0(zero) column name in the produced result set contains the matched line.
    Leading and trailing whitespaces in query element and input file lines are trimmed by default.
    Use ^ and $ boundary matchers to match the whole line.



Example:

    <query>
  ^ERROR: (.*)
  WARNING: (.*Failed.*)
  ([\d]+) errors?
</query>
    

This query consists of 3 regular expressions:

    selects lines starting with ERROR: prefix
    selects WARNING lines having Failed substring
    selects lines containg a number of errors, e.g. "Found 5 errors".

The query selects any line satisfying one of these 3 regular expressions.
Suppose input file has the following content:

Log file started...
INFO: INIT
WARNING: CPU is slow
WARNING: Failed to increase heap size
ERROR: Process interrupted
Operation completed with 1 error.


As the result of query execution the following set of rows is produced:

    
        0
        1
    
    
        WARNING: Failed to increase heap size
        Failed to increase heap size
    
    
        ERROR: Process interrupted
        Process interrupted
    
    
        1 error
        1
    


Script Syntax
The <script> element content is read line-by-line, for each line
properties are expanded and the output is sent to the file specifed by a url connection attribute.
Additional notes:

    Lines in the outputted file are separated by a EOL string specified by eol connection property.
    
    Leading and trailing whitespaces in the output file lines are trimmed by default.
    No escaping is performed when properties are expanded. Use String.replace or other escaping techniques to
        achieve output similar to CSV etc.
    
    If a script is executed multiple times (e.g. inside a parent query) the output is appended to the file
        content.
    



Example:

    <script>
    Inserted a record with ID=$id. Table=${table}
</script>
    

For id=1 and table=system this script produces the following output:

    Inserted a record with ID=1. Table=system
    


Properties substitution
In text script and query elements ${property} or $property syntax is used for properties/variables substition.
NOTE:
By default NULL variables and expressions are preserved, use null_string connection property to specify
a string token for nulls.
For example setting null_string to empty string in the connection properties section will enable parsing
empty strings as nulls:
<connection driver="csv" url="report.csv">
    null_string=
    </connection>
Scriptella properties substitution engine cannot distinguish null value from unused variable or some random usage of
$var syntax,
therefore we've chosen to preserve these blocks until user explicitly specify the value of null_string.
Examples
<connection id="in" driver="text" url="data.csv">
</connection>
<connection id="out" driver="text" url="report.csv">
</connection>

<script connection-id="out">
    ID;Priority;Summary;Status
</script>

<query connection-id="in">
    <script connection-id="out">
        $rownum;$column0;$column1;$column2
    </script>
</query>



Copies rows from data.csv file to report.csv, additionally the ID column is added.
The result file is semicolon separated.

Driver class:	`scriptella.driver.text.Driver`
URL:	`Text file URL. URIs are resolved relative to a script file directory. If url has no value the output is read from/printed to the console (System.out).`
Runtime dependencies:	`None`

Name	Description	Required
encoding	Specifies charset encoding of Text files.	No, the system default encoding is used.
eol	End-Of-Line suffix. Only valid for <script> elements.	No, the default value is `\n`.
trim	Value of `true` specifies that the leading and trailing whitespaces in text file lines should be omitted.	No, the default value is `true`.
flush	Value of `true` specifies that the outputted content should flushed immediately when the <script> element completes.	No, the default value is `false`.
skip_lines	The number of lines to skip before start reading.	No, the default value is `0` (no lines are skipped).
null_string	Specifies string token to represent Java `null` literal. When querying a text file, regex group equal to null_string is returned as Java null. When outputting content, if null_string is specified, all the missing variables, or the vars with a null value are substituted with `null_string`. Specify an empty string (`null_string=`) to automatically convert between nulls in memory and empty strings in files. For example: Query regex: `\d,\d,\d*`, input line `1,,5` is parsed into a set of 3 variables with the following values `{"1", null, "5"}` as opposed to the default behaviour `{"1","","5"}`.	No, by default strings are preserved, i.e. empty strings are not converted to nulls and null variables references are not expanded in the output, i.e. ${nullvalue}.

0	1
WARNING: Failed to increase heap size	Failed to increase heap size
ERROR: Process interrupted	Process interrupted
1 error	1