.patrodyne-etl-TransformIO.1.0.0.source-code.TransformIO.xsd Maven / Gradle / Ivy
Bean Shell - Executes Java syntax and extends it with common scripting conveniences.
Groovy - Builds upon the strengths of Java but has additional power features inspired by Python, Ruby and Smalltalk.
Java - Compiles a Java source then invokes its main method.
JavaScript - A multi-paradigm language, supporting object-oriented, imperative, and functional programming styles.
Jython - A Java implementation of Python. It is a general-purpose, high-level programming language with code readability.
The Batch class is instrumented with Java XML Binding (JAXB) annotations.
TransformIO binds the Batch configuration to XML files, normally identified by
their *.tio
suffix. When a user creates a batch configuration in
TransformIO and saves it, the batch is marshalled by JAXB and output to a TIO
file. When TransformIO opens a file it is unmarshalled by JAXB into a Batch
object graph for runtime processing.
]]>
A source identifies the location, encoding and record layout of an input steam.
The location is a Uniform Resource Locator (URL); thus, the input stream can be
a local file (file://
), a remote file (ftp://
), a page
(http://
) or any stream for which there is a protocol handler available.
The encoding specifies how the byte stream is convert into characters. Common
examples are UTF-8
, Cp1252
or ASCII
.
The record layout specifies how the input stream is parsed into records and
fields.
]]>
A target identifies the location, encoding and record layout of an output steam.
The location is a Uniform Resource Locator (URL); thus, the output stream can be
a local file (file://
), a remote file (ftp://
) or any
stream for which there is a protocol handler available.
The encoding specifies how the characters are converted into bytes. Common
examples are UTF-8
, Cp1252
or ASCII
.
The record layout specifies how the output stream is formatted into records and
fields.
]]>
Scripting for the Java Platform (JSR 223) is a framework for embedding scripts into Java source code.
Several engines are available for languages such as:
- Bean Shell - Executes Java syntax and extends it with common scripting conveniences.
- Groovy - Builds upon the strengths of Java but has additional power features inspired by Python, Ruby and Smalltalk.
- Java - Compiles a Java source then invokes its main method.
- JavaScript - A multi-paradigm language, supporting object-oriented, imperative, and functional programming styles.
- Jython - A Java implementation of Python. It is a general-purpose, high-level programming language with code readability.
]]>
A record is data divided into fields, in fixed number and sequence and identified by name.
Each field is given a name, a get regular expression and a set replacement
string. A regular expression
or regex provides a concise and flexible means to match (parse) sub-strings of text.
With the two expressions, fields can be parsed and formatted at the same time. For example,
given a record set of telephone numbers, the fields can be parsed and the phone number can be
formatted to remove the dashes.
Data
Mary Smith|123-456-7890
John Brown|098-765-4321
Source
<record>
<field name="Name" get="(.*)\|" set="$1"/>
<field name="Phone" get="(...)-(...)-(....)\n" set="$1$2$3"/>
</record>
The Name
field gets and groups all characters before the first pipe
'|'
. Then, it sets the Name
to be the first group without the pipe.
The Phone
field gets the three groups of data, found in between the dashes,
and prior to the end-of-line character. Then, it sets the Phone
field to be all
the groups as one compressed value, and omits the EOL.
This is a typical example of extracting data from a source stream. Once the fields are
assigned they become available for transformation or inclusion in the target for loading.
The target record can change the field order or fields can be omitted. For example:
Target
<record>
<field name="Phone" get="(.*)" set="$1|"/>
<field name="Name" get="(.*)" set="$1\n"/>
</record>
In this target, the field order has flipped. Both fields get all the data from
their assigned values. The Phone
field appends a pipe to delimit it
from the next field and the Name
field appends a new-line character to
end the record.
]]>
A field represents a single value within a record. It has a name, a get
expression and a set expression. The get expression is a
regular expression
or regex. The set expression is a replacement string. The replacement string
may contain references to captured groups (subsequences) from the get pattern.
Note: Field data is treated as text. There is no type conversion
because conversion is prone to errors. For example, converting a field to
an integer when the field contains the letter O instead of
the intended number 0 is an error that needs to be trapped
and handled. We prefer to keep field parsing and type conversion as two
separate concerns. When type conversion is needed, it can be handled in
the transformation script where error handling can be better implemented.
]]>
A locator type represents a Uniform Resource Locator.
It can be specified as a single attribute or in parts:
- protocol - Communication standard.
- username - Authentication account.
- password - Authentication credential.
- host - Server address name.
- port - Channel number.
- path - Resource location.
- query - Criteria parameters.
- anchor - Resource offset.
To specify the locator as a single value use the url attribute.
]]>