![JAR search and dependency download from the Maven repository](/logo.png)
scriptella.driver.text.package.html Maven / Gradle / Ivy
Text Driver for Scriptella.
It allows querying a text file based on regular expressions, the text driver
can also be used as a lightweight replacement for Velocity to produce
simple output with properties substitution.
Text driver does depends on additional libraries and is generally faster than CSV or Velocity driver.
Note: The driver doesn't use SQL syntax
General information
Driver class:
scriptella.driver.text.Driver
URL:
Text file URL. URIs are resolved relative to a script file directory.
If url has no value the output is read from/printed to the console (System.out).
Runtime dependencies:
None
Driver Specific Properties
Name
Description
Required
encoding
Specifies charset encoding of Text files.
No, the system default encoding is used.
eol
End-Of-Line suffix.Only valid for <script> elements.
No, the default value is \n
.
trim
Value of true
specifies that the leading and trailing
whitespaces in text file lines should be omitted.
No, the default value is true
.
flush
Value of true
specifies that the outputted content should flushed immediately when
the <script> element completes.
No, the default value is false
.
skip_lines
The number of lines to skip before start reading.
No, the default value is 0
(no lines are skipped).
null_string
Specifies string token to represent Java null
literal.
When querying a text file, regex group equal to null_string is returned as Java null.
When outputting content, if null_string is specified, all the missing variables, or the vars with a null
value
are substituted with null_string
.
Specify an empty string (null_string=
) to automatically convert between nulls in memory and
empty strings in files.
For example: Query regex: \d*,\d*,\d*
, input line 1,,5
is parsed into a set of
3 variables with the following values {"1", null, "5"}
as opposed to the default behaviour {"1","","5"}
.
No, by default strings are preserved, i.e. empty strings are not converted to nulls and null variables
references are not expanded in the output, i.e. ${nullvalue}.
Query Syntax
Text driver supports Regular expressions syntax to query text files.
The file is read line-by-line from the location specified by the URL connection property and each line is matched
against the regex pattern.
If a line or a part of it matches the pattern this match produces a virtual row in a result set.
The column names in a virtual result set correspond to matched regex group names.
For example query foo(.*)
matches foobar
line and the produced
result set row contains two columns(groups): 0-foobar, 1-bar. These columns
can be referenced in child script or query elements by a numeric name or by a string name columnN
.
It also possible to specify more than one regular expressions to match file content.
Specify each regular expression on a separate line to match them using OR condition.
The Text driver uses java.util.regex
implementation for pattern matching. See java.util.Pattern
for supported syntax Javadoc.
Additional notes:
- Regular expressions matching is case-insensitive
- Empty query selects all lines from the input file.
- The
0
(zero) column name in the produced result set contains the matched line.
- Leading and trailing whitespaces in query element and input file lines are trimmed by default.
- Use ^ and $ boundary matchers to match the whole line.
Example:
<query>
^ERROR: (.*)
WARNING: (.*Failed.*)
([\d]+) errors?
</query>
This query consists of 3 regular expressions:
- selects lines starting with
ERROR:
prefix
- selects
WARNING
lines having Failed
substring
- selects lines containg a number of errors, e.g. "Found 5 errors".
The query selects any line satisfying one of these 3 regular expressions.
Suppose input file has the following content:
Log file started...
INFO: INIT
WARNING: CPU is slow
WARNING: Failed to increase heap size
ERROR: Process interrupted
Operation completed with 1 error.
As the result of query execution the following set of rows is produced:
0
1
WARNING: Failed to increase heap size
Failed to increase heap size
ERROR: Process interrupted
Process interrupted
1 error
1
Script Syntax
The <script> element content is read line-by-line, for each line
properties are expanded and the output is sent to the file specifed by a url connection attribute.
Additional notes:
- Lines in the outputted file are separated by a EOL string specified by
eol
connection property.
- Leading and trailing whitespaces in the output file lines are trimmed by default.
- No escaping is performed when properties are expanded. Use String.replace or other escaping techniques to
achieve output similar to CSV etc.
- If a script is executed multiple times (e.g. inside a parent query) the output is appended to the file
content.
Example:
<script>
Inserted a record with ID=$id. Table=${table}
</script>
For id=1 and table=system this script produces the following output:
Inserted a record with ID=1. Table=system
Properties substitution
In text script and query elements ${property} or $property syntax is used for properties/variables substition.
NOTE:
By default NULL variables and expressions are preserved, use null_string
connection property to specify
a string token for nulls.
For example setting null_string to empty string in the connection properties section will enable parsing
empty strings as nulls:
<connection driver="csv" url="report.csv">
null_string=
</connection>
Scriptella properties substitution engine cannot distinguish null value from unused variable or some random usage of
$var syntax,
therefore we've chosen to preserve these blocks until user explicitly specify the value of null_string.
Examples
<connection id="in" driver="text" url="data.csv">
</connection>
<connection id="out" driver="text" url="report.csv">
</connection>
<script connection-id="out">
ID;Priority;Summary;Status
</script>
<query connection-id="in">
<script connection-id="out">
$rownum;$column0;$column1;$column2
</script>
</query>
Copies rows from data.csv file to report.csv, additionally the ID column is added.
The result file is semicolon separated.