xml.schema.JSaParSchema.xsd Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of jsapar Show documentation
Show all versions of jsapar Show documentation
JSaPar is a Java library providing a schema based parser and composer of almost collected sorts of delimited and fixed
width files.
The newest version!
The locale specifies how numbers and dates should be parsed. The default is to use the en_US locale.
A sequence of characters that separates each line. Default is the system default line
separator characters. To specify control characters, use either xml syntax e.g. for LF and for CR or
java notation such as '\n' for LF or '\r' for CR.
When parsing, if one of either '\r\n' or '\n' is specified as line separator, both both will be regarded as
line separator anyway. When composing, the specified sequence will be used.
Denotes how many times this type of line occurs. A '*'
character denotes that it occurs infinitely amount of times
until the end of input buffer is reached.
Specifies the type of the line. If there are more than one type of line or if linecondition cells are used, then
this field is mandatory. Otherwise this field can be supplied
as information. Line-type can be regarded as the class of the
line if there are many different type of lines. The line types
can be denoted by either the occurs attribute at each schema
line or by the value of a linecondition control cell.
Default is false. true = Ignore this line when parsing. This means that the line may
exist in the input but it will not generate any event. false = Parse and generate event.
Default is false. true = Ignore this line while composing. This means that the whole line will be ignored
as if it did not exist. false = Compose the line to the output.
Specifies a valid range for the cell value.
The locale of this cell. It affects how decimal separators etc. are formatted/parsed. Overrides the locale
setting for the whole schema.
A line condition is a condition that needs to be fulfilled in order for the parser to use this line type. You can add line condition on one or multiple cells and collected line conditions needs to be fulfilled for a line type to be used.
The name of the cell. All Cell objects created with this SchemaCell will have this name. When writing output,
the name of the Cell object have to match this name if the cell is to be written.
Default is false. true = Ignore this cell when reading. Proceed to next cell. This means that the cell have to
exist in the input file but it will not be stored in the resulting Document. false = Read and store this cell.
Default is false. true = Ignore this cell when writing. This means that the value of the cell will be ignored
and instead a value as if the cell was empty will be written. false = Write the cell value to the output.
States if this cell have to contain a value or if it is optional. If this attribute is false, an empty value is accepted as
long as the parsing mechanism can handle it for the given format. If this attribute is true, a parsing error
is reported back. Default is false.
The default value of this cell. This value is used if the cell does not conatain any value. The value should be
formatted according to the rules of the schema cell itself.
The type of the cell. Default is "string".
Have to be one of the following:
string
date
local_date
local_time
local_date_time
zoned_date_time
instant
decimal
integer
float
boolean
character
Note that the number formats are parsed according to the specified locale.
* If the type is string then the pattern should contain a regular expression to which the value is validated against. This only works while parsing.
* If the type is any of the numerical types, then the pattern should be described according to the java.text.DecimalFormat (http://java.sun.com/javase/6/docs/api/java/text/DecimalFormat.html).
* If the type is date, then the pattern should be described according to java.text.SimpleDateFormat (https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/text/SimpleDateFormat.html).
* If the type is one of the new java.time types such as local_date, local_date_time, zoned_date_time or instant, then the pattern should be described according to java.time.format.DateTimeFormatter (https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/format/DateTimeFormatter.html).
* If the type is boolean, the pattern should contain the true and false values separated with a ; character. Example: pattern="Y;N" will imply that Y represents true and N to represents false. Comparison while parsing is not case-sensitive. Multiple true or false values can be specified, separated with the | character but the first value is always the one used while composing. Example: pattern="Y|YES;N|NO"
* If the type is enum, the pattern should contain the fully qualified class name of the enum class to use.
The fully qualified enum class to use.
If true, the character upper or lower case is not considered while parsing. Default is false
The number of decimals to imply.
The enum constant name. Needs to match exactly the enum value name of the enum class. Always case sensitive.
The textual representation of the enum. In case multiple text values are matched to the same enum constant, the first one is used while composing.
Specifices the pad character to use to pad lines that are not reaching the minimum length. Also used as default pad character for cells of this line. Default is space character (ASCII 20).
The minimal length of a fixed with line when writing output.
If the sum of the length of the cells is less than this minlength then the line is filled with the fillcharacter so that the output will be at least of length minlength.
The length of the cell, i.e. the number of
characters it occupies.
Defines the cell alignment. The remaining space is filled with the pad character. Have to be one of the following: left right center
Specifies the pad character to use to pad cells that are not reaching the minimum length. The alignment attribute specifies if padding should be done to the right, to the left or both. Default is the pad character specified on line level.
If set to false, pad characters are not removed while parsing. Default is true.
If set to false, leading space characters not removed while parsing in case pad character is something other than space. Default is true.
A sequence of characters that are used to delimit each cell of all lines that does not have different cell separator specified.
Specifies the syntax while parsing and composing of quoted cells.
FIRST_LAST
Quoted cells are considered quoted if and only if it begins and ends with a quote character and collected the intermediate characters are treated as is.
This is the most common scenario in delimited files since most sources that generates delimited files only adds quotes first and last of the cell without inspecting the content.
"aaa","b""bb","ccc" will be treated as three cells with the values `aaa`, `b""bb` and `ccc`. No characters will be replaced or removed between the enclosing quotes. Be aware that this mode will treat the input "aaa","b"","ccc" as three cells with the values `aaa`, `b"` and `ccc`.
While composing, the content of a cell is always written as is and enclosing quotes are just added.
RFC4180
Parsing and composing will consider the RFC 4180 regarding quotes. Any double occurrences of quote characters will be treated as if one quote character is an escape character and the other will become part of the cell value.
For instance "aaa","b""bb","ccc" will still be treated as three cells but with the values `aaa`, `b"bb` and `ccc`. This mode will treat the input "aaa","b"",bbb" as two cells with the values `aaa` and , `b",bbb`.
When composing quoted cells, collected quotes within cell value will be escaped with an additional quote character in order to make the output compliant.
According to RFC 4180, single quotes may not occur inside a quoted cell. This parser will however allow it and treat it as part of the cell value as long as it is not followed by a cell or line separator.
A sequence of characters that are used to
delimit each cell of this line.
Specifies if the first line of this type contains a header line that should be used as schema.
If set to true:
While parsing, the order of the cells and what cells to expect are denoted by the first line within the input.
Default values and other formatting information is defined in the schema as usual. Cell names
that exist in the input header line but not within the schema are considered to be of type string.
While composing, the schema is used as usual but an additional header line with the name of each
cell, is written first.
Specifies which quote character that should be used
to encapsulate cells. Default is the standard double quote character (").
The value "NONE" disables quoting entirely. Between the quotes, also the cell
separator and line separator will be regarded as content of the cell.
The quoting syntax can be specified on schema level.
Default quote behavior for the line type. Override this setting on cell level if needed. Should have one of the values:
AUTOMATIC, NEVER, REPLACE or ALWAYS.
A valid ISO Language Code. These codes are the lower-case, two-letter codes as defined by ISO-639. You can find a full list of these codes at a number of sites, such as:
http://www.loc.gov/standards/iso639-2/englangn.html
A valid ISO Country Code. These codes are the upper-case, two-letter codes as defined by ISO-3166. You can find a full list of these codes at a number of sites, such as:
http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html
Contains a range for a cell value. The format of the min and max attribute are the same as for the parsing of the cell values.
The smallest value allowed for this cell.
The largest value allowed for this cell.
The maximum number of characters that are read or written to/from the cell. Input and output
value will be silently truncated to this length. If you want to get an error when field is to
long, use the format regexp pattern instead.
Quote behavior on cell level. Overrides line behavior.
Available since 2.0
Regular expression pattern that needs to match the content of the cell.
The value that the cell value that needs to be equal to.
If true, ignore case.
AUTOMATIC = (default) If quote character is specified: Quote only when needed. If quote
character is not specified, replace illegal characters with non-breaking space. Atomic cells,
e.g. integer cells, will not be quoted.
NEVER = Never quote
REPLACE = Don't quote but replace illegal characters with non-breaking space.
ALWAYS = Always quote