es.rickyepoderi.wbxml.definition.package-info Maven / Gradle / Ivy
/**
* This package is intended to manage WBXML language definitions. The WBXML
* format specification assigns binary tokens (usually one byte per token)
* to different elements of a typical XML format (tags, attribute names and
* so on). So it can be said that like a XML uses a DTD or XSD to define a
* specific XM language the WBXML uses (besides) another definition. Here
* that definition will be called language definition. There are several
* WBXML language definitions like: SyncML (Synchronization Markup Language),
* WV (Wireless Village), Exchange ActiveSync (EAS), WML (Wireless Markup
* Language) and many more.
*
* So in order to parse and encode a WBXML format it is absolutely necessary
* a way of understanding, defining and loading the language. This packages does
* all of these things. The main ideas have been copied from
* libwbxml C implementation.
*
* Definition
*
* A language definition is done using a simple properties file (it was
* thought like this cos it is absolutely simple and it does not depend
* on anything weird). The following keys are fixed keys used to define a
* language:
*
*
* - wbxml.name: Compulsory key that is used to identify the language
* (should be unique among all languages, for example SyncML 1.1 or
* WV CSP 1.1
* - wbxml.publicid: WBXML specification defines that a language should use
* a standard integer identifier (mb_u_int32 integer). This id is encoded at the
* begining of the document to know the language the doc is using (SyncML is
* for instance 0x0FD3). Although it is not absolutely compulsory it is
* very recommended (see next key).
* - wbxml.xmlpublicidentifier: The XML FPI (Formal Public Identifier) which
* is used in the Doctype of any XML document. In this case is a String id
* which is defined in the DTD of the language (again SyncML uses
* //SYNCML//DTD SyncML 1.1//EN as FPI). WBXML specification let
* encode the language reference using this id if the previous integer id
* is unknown.
* - wbxml.xmlurireference: Optional reference to the DTD that defines the
* XML structure of the language (usually the standard languages are defined
* using a DTD).
* - wbxml.rootelement: The root element of the document. It is used in the
* WBXML to guess the language used if the definition is not given previously.
* It is compulsory to define this key. If the language uses namespaces
* the root element should be prefixed.
* - wbxml.class: Optional key used when JAXB is used (this let us know how
* class represents the root element of the language).
*
*
* Besides those fixed keys a WBXML language defines list of tokens
* to encode in binary a XML document. All those tokens are the following:
*
* Namespaces
*
* Some WBXML languages uses namespaces (one or more than one) to define
* the tags and attributes. A list of keys are used to define all the namespaces
* the language defines (if the language does not use namespaces no keys
* are used). The format of a key that defines a namespace is the following:
*
*
* wbxml.namespaces.{prefix}={namespaceURI}
*
* · The prefix will be used for all the rest of properties (tags, attributes).
* · The namespaceURI is the namespace the prefix refers to.
*
*
* For example SyncML defines two namesapaces and the definition is as
* follows:
*
*
* wbxml.namespaces.syncml=SYNCML:SYNCML1.1
* wbxml.namespaces.metinf=syncml:metinf
*
* Two namespaces ("syncml" and "metinf" prefix) that will be used in the
* following keys (keys for tags and attributes).
*
*
* Tags
*
* The XML tags (following WBXML specification) are encoded by 5 bits
* (in reality in a WBXML a tag token is one byte but the 7 and 6 bit
* marks if it has attributes and contents respectively, so only 5 bits
* remain for token definition). Besides tag tokens are grouped in a page
* code (another byte). So a XML tag is always defined by two numbers:
* page code (one byte) and the tag token (5 bits). Tags encoding is explained
* in the WBXML specification in chaper 5.8.2. Tag Code Space. In
* the properties file each tag is only defined by one key with the following
* format:
*
*
* wbxml.tag.{pageCode}.[{prefix}:]{name}={token}
*
* · pageCode is the byte for the page the tag token is grouped
* (usually is presented in decimal format altough is parsed with decode).
* · prefix is the prefix of the namespace if the token is defined inside
* a namespace (it is not used if the language does not use namespaces).
* · name is the tag name.
* · token is the 5 bits of the token (usually in hexa).
*
*
* Some examples from the SyncML language (a prefixed one):
*
*
* wbxml.tag.0.syncml\:Add=0x05
* wbxml.tag.0.syncml\:Alert=0x06
* wbxml.tag.0.syncml\:Archive=0x07
* ...
* wbxml.tag.1.metinf\:Anchor=0x05
* wbxml.tag.1.metinf\:EMI=0x06
* wbxml.tag.1.metinf\:Format=0x07
*
* Examples of the first tags of every namespace (which in SyncML are also
* different pages).
*
*
* SI language does not use namespaces, in that case tags are not prefixed:
*
*
* wbxml.tag.0.si=0x05
* wbxml.tag.0.indication=0x06
* wbxml.tag.0.info=0x07
* wbxml.tag.0.item=0x08
*
* SI is very little language and those are the only four tags it uses.
*
*
* Attributes
*
* The attribute names in the WBXML specification are encoded using
* again one byte but they have to be less 128 (the 7 should be 0 cos
* 1 is used for attribute values). Besides the attribute names can specified
* part of the whole value. A token could represent only the attribute name
* URL= or the name and part of the value PUBLIC="TRUE".
* This way the same XML attribute name can have several tokens (each one
* will represent a different value). The attributes are also grouped in pages.
* The chapter that explains attributes in the WBXML documentation is
* 5.8.3. Attribute Code Space (ATTRSTART and ATTRVALUE). The
* attributes in the properties file use two different keys:
*
*
* wbxml.attr.{pageCode}.[{prefix}:]{name}[.{optional-differenciator}]={token}
* {previous-key}.value={optional-value}
*
* The first key defines the token for a attribute name.
* The second the optional value part.
*
*
* The SI language for example defines several tokens (all of them in the
* page 0) for some different value parts. Obviously the longest attribute
* should be used when it matches (this way more characters are encoded in just
* one byte).
*
*
* wbxml.attr.0.href=0x0b
* wbxml.attr.0.href.httpwww=0x0d
* wbxml.attr.0.href.httpwww.value=http://www.
* wbxml.attr.0.href.http=0x0c
* wbxml.attr.0.href.http.value=http://
*
*
* Attribute Values
*
* The WBXML specification also lets encode different values (or part of
* the value) in a byte token. An attribute value uses a token byte (in
* that case the token should be greater or equal to 128) and also belongs
* to a page. A token just represent a string with the value (or part of
* the value) that it encodes. The same chapter
* 5.8.3. Attribute Code Space (ATTRSTART and ATTRVALUE) of the
* specification deals with them. In the property file they use two keys
* very similar to attribute keys:
*
*
* wbxml.attrvalue.{pageCode}[.{optional}]={token}
* {previous_key}.value={value}
*
* The first key marks the token and page.
* The second the string value (it is compulsory for values).
*
*
* The following example of attribute values are from the SI language:
*
*
* wbxml.attrvalue.0.com=0x85
* wbxml.attrvalue.0.com.value=.com/
* wbxml.attrvalue.0.edu=0x86
* wbxml.attrvalue.0.edu.value=.edu/
* wbxml.attrvalue.0.net=0x87
* wbxml.attrvalue.0.net.value=.net/
* wbxml.attrvalue.0.org=0x88
* wbxml.attrvalue.0.org.value=.org/
*
*
* Extensions
*
* WBXML also defines extensions which are tokens that can be used to
* encode any string value in attributes or tag contents. The extensions
* are explained in the chapter 5.8.4.2. Global Extension Tokens and,
* although the specification talks about three type of extensions, the
* libwbxml only uses one of them (it is supposed that no languages
* are using the other two). In the properties file the extension is also
* defined by two keys:
*
*
* wbxml.ext.{key_differenciator}={token}
* {previous_key}.value={value}
*
* The first key defines the token number of the extension.
* The second key the value for that extension.
*
*
* In the languages which have been added only one of them uses
* extensions (WV - Wireless Village). They are used to encode some attribute
* values specified as enum list of values and similar things. The following are
* some of them:
*
*
* wbxml.ext.appvnd=0x04
* wbxml.ext.appvnd.value=application/vnd.wap.mms-message
* wbxml.ext.wirelessuri=0x30
* wbxml.ext.wirelessuri.value=www.wireless-village.org
* wbxml.ext.GROUP_USER_ID_AUTOJOIN=0x50
* wbxml.ext.GROUP_USER_ID_AUTOJOIN.value=GROUP_USER_ID_AUTOJOIN
* wbxml.ext.GROUP_USER_ID_JOINED=0x40
* wbxml.ext.GROUP_USER_ID_JOINED.value=GROUP_USER_ID_JOINED
* ...
*
*
* Opaque Plugins
*
* The final element that can be specified in a language definition properties
* file are the opaque plugins. The WBXML defines the opaque data in the chapter
* 5.8.4.6. Opaque Data. It is a special way to encode tag contents
* or attribute values in an unknown binary format. Obviously this is weird
* cos it is a way of encoding data in a way that cannot be decoded by the
* standard (you have to know how the opaque data is written). Nevertheless
* several languages uses them to encode DateTime values or even more strange
* things.
*
* The libwbxml library adds custom code in order to parse or encode
* the opaque data (the code is full of defines that check if some language is
* supported to add some methods that encode or decode the opaque). In order
* to add this feature in the Java library a OpaquePlugin interface is used.
* The language specification file could add this plugin to attributes or
* tags. This way when a WBXML document is encoded the plugin receive the
* data to encode (for the specified tags or attributes) and when the document
* is parsed the plugin receives the opaque byte array. Obviously in the
* language definition properties file two keys are used:
*
*
* Key for attribute plugins:
*
* webxml.opaque.attr.{pageCode}.{name}={class}
*
* · The page code of the attribute the plugin is associated to
* · The name of the attribute
* · The class the implements the OpaquePlugin interface
*
* Key for tag plugins:
*
* webxml.opaque.tag.{pageCode}.{name}={class}
*
* · The page code of the tag the plugin is associated to
* · The name of the tag
* · The class the implements the OpaquePlugin interface
*
*
* This way the library gives a way of attacking those weird opaque data
* that can be found in every language. The WV for example uses opaque data
* to encode integers and DateTimes (it really is optional, you can use
* normal string values or opaque data). For that tag attributes two
* different plugins have been added to the library:
*
*
* wbxml.opaque.tag.0.Code=es.rickyepoderi.wbxml.document.opaque.WVIntegerOpaque
* wbxml.opaque.tag.0.ContentSize=es.rickyepoderi.wbxml.document.opaque.WVIntegerOpaque
* ...
* wbxml.opaque.tag.0.DateTime=es.rickyepoderi.wbxml.document.opaque.WVDateTimeOpaque
* wbxml.opaque.tag.6.DeliveryTime=es.rickyepoderi.wbxml.document.opaque.WVDateTimeOpaque
*
*
* Linked Definitions
*
* This is a non standard feature. Some languages (like SyncML) has some
* tags that contains inside it another WBXML file of another language (devinf
* in case of the SyncML language). In order to handle this a new concept
* was added: Linked Defnition. A linked definition is just that,
* another definition that can be used inside this definition someway (using
* and opaque for sure). This way the parsing/encoding process can know the
* prefix, namespaces and tags that can be appear in any document of the
* language.
*
* This properties are only used in SyncML (several versions) definitions.
*
* The format is very simple:
*
*
* wbxml.opaque.linkeddef.{def_differenciator}={name_of_the_linked_definition}
*
* · def_differenciator is just something to link several definitions.
* · name_of_the_linked_definition is the name of the linked definition (the
* specified in the linked properties file for that definition.
*
*
* For example the SyncML 1.2 language uses two linked definitions
* in the following way:
*
*
* wbxml.opaque.linkeddef.devinf12=DevInf 1.2
* wbxml.opaque.linkeddef.dmddf12=DMDDF 1.2
*
*
* Initialization
*
* The initialization or loading of all the definitions into the JVM is done
* at the initialization of the WbXmlInitialization class. There is a location
* where all the properties file should be placed. This location should be
* inside the classpath (it can be a JAR file or normal directory).
*
* By default the wbxml-stream contains the default language
* definitions in the following path:
* es/rickyepoderi/wbxml/definition/defaults
* and they are loaded into the system by default.
*
* If it is needed to load different definitions the files should be packed
* inside the classpath (inside a JAR or using a directory) and a system
* property can be used to denote that this path should be used now instead of
* the default one:
*
*
* -Des.rickyepoderi.wbxml.definition.path=new/classpath/resource/path
*
*
* Remember that the path is a path inside the classpath, not a file system
* path.
*
*/
package es.rickyepoderi.wbxml.definition;