All Downloads are FREE. Search and download functionalities are using the official Maven repository.

es.rickyepoderi.wbxml.definition.package-info Maven / Gradle / Ivy

/**
 * 

This package is intended to manage WBXML language definitions. The WBXML * format specification assigns binary tokens (usually one byte per token) * to different elements of a typical XML format (tags, attribute names and * so on). So it can be said that like a XML uses a DTD or XSD to define a * specific XM language the WBXML uses (besides) another definition. Here * that definition will be called language definition. There are several * WBXML language definitions like: SyncML (Synchronization Markup Language), * WV (Wireless Village), Exchange ActiveSync (EAS), WML (Wireless Markup * Language) and many more.

* *

So in order to parse and encode a WBXML format it is absolutely necessary * a way of understanding, defining and loading the language. This packages does * all of these things. The main ideas have been copied from * libwbxml C implementation.

* *

Definition

* *

A language definition is done using a simple properties file (it was * thought like this cos it is absolutely simple and it does not depend * on anything weird). The following keys are fixed keys used to define a * language:

* *
    *
  • wbxml.name: Compulsory key that is used to identify the language * (should be unique among all languages, for example SyncML 1.1 or * WV CSP 1.1
  • *
  • wbxml.publicid: WBXML specification defines that a language should use * a standard integer identifier (mb_u_int32 integer). This id is encoded at the * begining of the document to know the language the doc is using (SyncML is * for instance 0x0FD3). Although it is not absolutely compulsory it is * very recommended (see next key).
  • *
  • wbxml.xmlpublicidentifier: The XML FPI (Formal Public Identifier) which * is used in the Doctype of any XML document. In this case is a String id * which is defined in the DTD of the language (again SyncML uses * //SYNCML//DTD SyncML 1.1//EN as FPI). WBXML specification let * encode the language reference using this id if the previous integer id * is unknown.
  • *
  • wbxml.xmlurireference: Optional reference to the DTD that defines the * XML structure of the language (usually the standard languages are defined * using a DTD).
  • *
  • wbxml.rootelement: The root element of the document. It is used in the * WBXML to guess the language used if the definition is not given previously. * It is compulsory to define this key. If the language uses namespaces * the root element should be prefixed.
  • *
  • wbxml.class: Optional key used when JAXB is used (this let us know how * class represents the root element of the language).
  • *
* *

Besides those fixed keys a WBXML language defines list of tokens * to encode in binary a XML document. All those tokens are the following:

* *

Namespaces

* *

Some WBXML languages uses namespaces (one or more than one) to define * the tags and attributes. A list of keys are used to define all the namespaces * the language defines (if the language does not use namespaces no keys * are used). The format of a key that defines a namespace is the following:

* *
 * wbxml.namespaces.{prefix}={namespaceURI}
 * 
 * · The prefix will be used for all the rest of properties (tags, attributes).
 * · The namespaceURI is the namespace the prefix refers to.
 * 
* *

For example SyncML defines two namesapaces and the definition is as * follows:

* *
 * wbxml.namespaces.syncml=SYNCML:SYNCML1.1
 * wbxml.namespaces.metinf=syncml:metinf
 * 
 * Two namespaces ("syncml" and "metinf" prefix) that will be used in the
 * following keys (keys for tags and attributes).
 * 
* *

Tags

* *

The XML tags (following WBXML specification) are encoded by 5 bits * (in reality in a WBXML a tag token is one byte but the 7 and 6 bit * marks if it has attributes and contents respectively, so only 5 bits * remain for token definition). Besides tag tokens are grouped in a page * code (another byte). So a XML tag is always defined by two numbers: * page code (one byte) and the tag token (5 bits). Tags encoding is explained * in the WBXML specification in chaper 5.8.2. Tag Code Space. In * the properties file each tag is only defined by one key with the following * format:

* *
 * wbxml.tag.{pageCode}.[{prefix}:]{name}={token}
 * 
 * · pageCode is the byte for the page the tag token is grouped 
 *   (usually is presented in decimal format altough is parsed with decode).
 * · prefix is the prefix of the namespace if the token is defined inside
 *   a namespace (it is not used if the language does not use namespaces).
 * · name is the tag name.
 * · token is the 5 bits of the token (usually in hexa).
 * 
* *

Some examples from the SyncML language (a prefixed one):

* *
 * wbxml.tag.0.syncml\:Add=0x05
 * wbxml.tag.0.syncml\:Alert=0x06
 * wbxml.tag.0.syncml\:Archive=0x07
 * ...
 * wbxml.tag.1.metinf\:Anchor=0x05
 * wbxml.tag.1.metinf\:EMI=0x06
 * wbxml.tag.1.metinf\:Format=0x07
 * 
 * Examples of the first tags of every namespace (which in SyncML are also
 * different pages).
 * 
* *

SI language does not use namespaces, in that case tags are not prefixed:

* *
 * wbxml.tag.0.si=0x05
 * wbxml.tag.0.indication=0x06
 * wbxml.tag.0.info=0x07
 * wbxml.tag.0.item=0x08
 * 
 * SI is very little language and those are the only four tags it uses.
 * 
* *

Attributes

* *

The attribute names in the WBXML specification are encoded using * again one byte but they have to be less 128 (the 7 should be 0 cos * 1 is used for attribute values). Besides the attribute names can specified * part of the whole value. A token could represent only the attribute name * URL= or the name and part of the value PUBLIC="TRUE". * This way the same XML attribute name can have several tokens (each one * will represent a different value). The attributes are also grouped in pages. * The chapter that explains attributes in the WBXML documentation is * 5.8.3. Attribute Code Space (ATTRSTART and ATTRVALUE). The * attributes in the properties file use two different keys:

* *
 * wbxml.attr.{pageCode}.[{prefix}:]{name}[.{optional-differenciator}]={token}
 * {previous-key}.value={optional-value}
 * 
 * The first key defines the token for a attribute name.
 * The second the optional value part.
 * 
* *

The SI language for example defines several tokens (all of them in the * page 0) for some different value parts. Obviously the longest attribute * should be used when it matches (this way more characters are encoded in just * one byte).

* *
 * wbxml.attr.0.href=0x0b
 * wbxml.attr.0.href.httpwww=0x0d
 * wbxml.attr.0.href.httpwww.value=http://www.
 * wbxml.attr.0.href.http=0x0c
 * wbxml.attr.0.href.http.value=http://
 * 
* *

Attribute Values

* *

The WBXML specification also lets encode different values (or part of * the value) in a byte token. An attribute value uses a token byte (in * that case the token should be greater or equal to 128) and also belongs * to a page. A token just represent a string with the value (or part of * the value) that it encodes. The same chapter * 5.8.3. Attribute Code Space (ATTRSTART and ATTRVALUE) of the * specification deals with them. In the property file they use two keys * very similar to attribute keys:

* *
 * wbxml.attrvalue.{pageCode}[.{optional}]={token}
 * {previous_key}.value={value}
 * 
 * The first key marks the token and page.
 * The second the string value (it is compulsory for values).
 * 
* *

The following example of attribute values are from the SI language:

* *
 * wbxml.attrvalue.0.com=0x85
 * wbxml.attrvalue.0.com.value=.com/
 * wbxml.attrvalue.0.edu=0x86
 * wbxml.attrvalue.0.edu.value=.edu/
 * wbxml.attrvalue.0.net=0x87
 * wbxml.attrvalue.0.net.value=.net/
 * wbxml.attrvalue.0.org=0x88
 * wbxml.attrvalue.0.org.value=.org/
 * 
* *

Extensions

* *

WBXML also defines extensions which are tokens that can be used to * encode any string value in attributes or tag contents. The extensions * are explained in the chapter 5.8.4.2. Global Extension Tokens and, * although the specification talks about three type of extensions, the * libwbxml only uses one of them (it is supposed that no languages * are using the other two). In the properties file the extension is also * defined by two keys:

* *
 * wbxml.ext.{key_differenciator}={token}
 * {previous_key}.value={value}
 * 
 * The first key defines the token number of the extension.
 * The second key the value for that extension.
 * 
* *

In the languages which have been added only one of them uses * extensions (WV - Wireless Village). They are used to encode some attribute * values specified as enum list of values and similar things. The following are * some of them:

* *
 * wbxml.ext.appvnd=0x04
 * wbxml.ext.appvnd.value=application/vnd.wap.mms-message
 * wbxml.ext.wirelessuri=0x30
 * wbxml.ext.wirelessuri.value=www.wireless-village.org
 * wbxml.ext.GROUP_USER_ID_AUTOJOIN=0x50
 * wbxml.ext.GROUP_USER_ID_AUTOJOIN.value=GROUP_USER_ID_AUTOJOIN
 * wbxml.ext.GROUP_USER_ID_JOINED=0x40
 * wbxml.ext.GROUP_USER_ID_JOINED.value=GROUP_USER_ID_JOINED
 * ...
 * 
* *

Opaque Plugins

* *

The final element that can be specified in a language definition properties * file are the opaque plugins. The WBXML defines the opaque data in the chapter * 5.8.4.6. Opaque Data. It is a special way to encode tag contents * or attribute values in an unknown binary format. Obviously this is weird * cos it is a way of encoding data in a way that cannot be decoded by the * standard (you have to know how the opaque data is written). Nevertheless * several languages uses them to encode DateTime values or even more strange * things.

* *

The libwbxml library adds custom code in order to parse or encode * the opaque data (the code is full of defines that check if some language is * supported to add some methods that encode or decode the opaque). In order * to add this feature in the Java library a OpaquePlugin interface is used. * The language specification file could add this plugin to attributes or * tags. This way when a WBXML document is encoded the plugin receive the * data to encode (for the specified tags or attributes) and when the document * is parsed the plugin receives the opaque byte array. Obviously in the * language definition properties file two keys are used:

* *
 * Key for attribute plugins:
 * 
 * webxml.opaque.attr.{pageCode}.{name}={class}
 *
 * · The page code of the attribute the plugin is associated to
 * · The name of the attribute
 * · The class the implements the OpaquePlugin interface
 *
 * Key for tag plugins:
 * 
 * webxml.opaque.tag.{pageCode}.{name}={class}
 * 
 * · The page code of the tag the plugin is associated to
 * · The name of the tag
 * · The class the implements the OpaquePlugin interface
 * 
* *

This way the library gives a way of attacking those weird opaque data * that can be found in every language. The WV for example uses opaque data * to encode integers and DateTimes (it really is optional, you can use * normal string values or opaque data). For that tag attributes two * different plugins have been added to the library:

* *
 * wbxml.opaque.tag.0.Code=es.rickyepoderi.wbxml.document.opaque.WVIntegerOpaque
 * wbxml.opaque.tag.0.ContentSize=es.rickyepoderi.wbxml.document.opaque.WVIntegerOpaque
 * ...
 * wbxml.opaque.tag.0.DateTime=es.rickyepoderi.wbxml.document.opaque.WVDateTimeOpaque
 * wbxml.opaque.tag.6.DeliveryTime=es.rickyepoderi.wbxml.document.opaque.WVDateTimeOpaque
 * 
* *

Linked Definitions

* *

This is a non standard feature. Some languages (like SyncML) has some * tags that contains inside it another WBXML file of another language (devinf * in case of the SyncML language). In order to handle this a new concept * was added: Linked Defnition. A linked definition is just that, * another definition that can be used inside this definition someway (using * and opaque for sure). This way the parsing/encoding process can know the * prefix, namespaces and tags that can be appear in any document of the * language.

* *

This properties are only used in SyncML (several versions) definitions.

* *

The format is very simple:

* *
 * wbxml.opaque.linkeddef.{def_differenciator}={name_of_the_linked_definition}
 * 
 * · def_differenciator is just something to link several definitions.
 * · name_of_the_linked_definition is the name of the linked definition (the
 *   specified in the linked properties file for that definition.
 * 
* *

For example the SyncML 1.2 language uses two linked definitions * in the following way:

* *
 * wbxml.opaque.linkeddef.devinf12=DevInf 1.2
 * wbxml.opaque.linkeddef.dmddf12=DMDDF 1.2
 * 
* *

Initialization

* *

The initialization or loading of all the definitions into the JVM is done * at the initialization of the WbXmlInitialization class. There is a location * where all the properties file should be placed. This location should be * inside the classpath (it can be a JAR file or normal directory).

* *

By default the wbxml-stream contains the default language * definitions in the following path: * es/rickyepoderi/wbxml/definition/defaults * and they are loaded into the system by default.

* *

If it is needed to load different definitions the files should be packed * inside the classpath (inside a JAR or using a directory) and a system * property can be used to denote that this path should be used now instead of * the default one:

* *
 * -Des.rickyepoderi.wbxml.definition.path=new/classpath/resource/path
 * 
* *

Remember that the path is a path inside the classpath, not a file system * path.

* */ package es.rickyepoderi.wbxml.definition;




© 2015 - 2025 Weber Informatics LLC | Privacy Policy