org.xmlcml.cml.base.elements.xsd Maven / Gradle / Ivy
The abundance of an isotope.
The abundance of an isotope in an isotopeList. Values are expressed in percentages.
An action which might occur in scientific data or narrative.
An action which might occur in scientific data or narrative. The definition is deliberately vague, intending to collect examples of possible usage. Thus an action could be addition of materials, measurement, application of heat or radiation. The content model is unrestricted. _action_ iself is normally a child of _actionList_.
The start, end and duration attributes should be interpreted as
XSD dateTimes and XSD durations. This allows precise recording of time of day, etc, or duration after start of actionList. A
convention="xsd" attribute should be used to enforce XSD.
a numerical value, with a units attribute linked to a dictionary.
a human-readable string (unlikely to be machine processable)
startCondition and
endCondition values are not constrained, which allows XSL-like
test attribute values. The semantics of the conditions are yet to be defined and at present are simply human readable.
The order of the
action elements in the document may, but will not always, define the order that they actually occur in.
A delay can be shown by an
action with no content. Repeated actions or actionLists are indicated through the count attribute.
Number of times the action should be repeated.
A container for a group of actions.
ActionList contains a series of
action s or nested
actionList s.
An alternative name for an entry.
At present a child of _entry_ which represents an alternative string that refers to the concept. There is a partial controlled vocabulary in _alternativeType_ with values such as :
synonym
acronym
abbreviation
The amount of a substance.
The
units attribute is mandatory and can be customised to support mass, volumes, moles, percentages, or ratios (e.g. ppm).
An angle between three atoms.
It can be used for:
Recording experimentally determined bond angles (e.g. in a crystallographic paper).
Providing the angle component for internal coordinates (e.g. z-matrix).
A documentation container similar to annotation in XML Schema.
A documentation container similar to
annotation in XML Schema. At present this is experimental and designed to be used for dictionaries, units, etc. One approach is to convert these into XML Schemas when the
documentation and
appinfo children will emerge in their correct position in the derived schema.
It is possible that this may develop as a useful tool for annotating components of complex objects such as molecules.
A container similar to appinfo in XML Schema.
A container for machine processable documentation for an entry. This is likely to be platform and/or language specific. It is possible that XSLT, RDF or XBL will emerge as generic languages. See _annotation_ and _documentation_ for further information.
An example in XSLT where an element _foo_ calls a bespoke template .
Allows a processor to inspect the role of the appinfo and process accordingly.
An argument for a function.
Arguments can be typed and have explicit or free values. They can also carry out substitutions in the parent element and its children (substitute, still experiemental) and delete itself after this.
2006-02-14: PMR. Added atomType as child
2006-05-21: PMR. Added substitute and delete attributes
A homogenous 1 dimensional array of similar object.
These can be encoded as strings (i.e. XSD-like datatypes) and are concatenated as string content. The size of the array should always be >= 1. The default delimiter is whitespace. The _normalize-space()_ function of XSLT could be used to normalize all whitespace to single spaces and this should not affect the value of the array elements. To extract the elements __java.lang.StringTokenizer__ could be used. If the elements themselves contain whitespace then a different delimiter must be used and is identified through the
delimiter attribute. This method is mandatory if it is required to represent empty strings. If a delimiter is used it MUST start and end the array - leading and trailing whitespace is ignored. Thus
size+1 occurrences of the delimiter character are required. If non-normalized whitespace is to be encoded (e.g. newlines, tabs, etc) you are recommended to translate it character-wise to XML character entities.
Note that normal Schema validation tools cannot validate the elements of
array (they are defined as
string ) However if the string is split, a temporary schema can be constructed from the type and used for validation. Also the type can be contained in a dictionary and software could decide to retrieve this and use it for validation.
When the elements of the
array are not simple scalars (e.g.
scalar s with a value and an error, the
scalar s should be used as the elements. Although this is verbose, it is simple to understand. If there is a demand for more compact representations, it will be possible to define the syntax in a later version.
the
size attribute is not mandatory but provides a useful validity check):
Alternative to units
Must be used in conjunction with unitType
2005-10-26: added
Alternative to units
Must be used in conjunction with unitType
2005-10-26: added
Alternative to units
Must be used in conjunction with multiplierToSI and/or constantToSI
2005-10-26: added
A list of
arrays or
lists.
A major use of arrayList is to contain data within rectangular tables. However there is no absolute requirement and the table can have any shape. The
shape attribute hould be used to assert rectangularity.
2006-11-03: created
An atom.
Atoms can only be chosen from the periodic table and superatoms such as "Phe" or "Tyr" are not allowed. The elementType of an atom is identified by that attribute. There are two additional elementTypes, "Du" (for an object which does not have an identifiable nucleus but is useful in calculations and definitions (such as a centroid); and "R" which describes a generic fragment. Although atoms have an elementType, they do not, by default, support arbitrary atomTypes for which the <atomType> element should be used.
2006-01-12: PMR. Added vector3 child to support accelerations, velocities, dipole, etc.
2006-06-01: PMR. Added documentation.
The main content model of the atom.
name can be used for atom labels, etc. More than one name can be used if required.
scalar contains any scalar properties of the atom (examples are chemical shift, B-value, etc.) linked through
dictRef (CmlDictRefType).
array contains any properties of the atom describable by a homogeneous array linked through
dictRef (CmlDictRefType).
matrix contains any properties of the atom describable by a homogeneous matrix linked through
dictRef (CmlDictRefType). An example is the polarizability tensor
atomParity (CmlAtomParityElement) the required way of defining atom-based chirality
electron a away of associating electron(s) with the atom
Most useful in _formula_ but possibly useful in _atomArray_ where coordinates and connectivity is not defined. No formal default, but assumed to be 1.
This can be used to describe the purpose of atoms whose _elementType_s are __dummy__ or __locant__. Vocabulary not controlled.
2005-11-27: Added PMR
2005-11-27: Added PMR
A container for a list of atoms.
A child of _molecule_ and contains _atom_ information. There are two strategies:
Create individual _atom_ elements under _atomArray_ (in any order). This gives the greatest flexibility but is the most verbose.
Create
*Array attributes (e.g. of _elementTypeArrayType_ under _atomArray_. This requires all arrays to be of identical lengths with explicit values for all atoms in every array. This is NOT suitable for complexType atom children such as _atomParity_. It also cannot be checked as easily by schema- and schematron validation. The _atomIDArray_ attribute is mandatory. It is allowed (though not yet recommended) to add _*Array_ children such as _floatArray_
The attributes are directly related to the scalar attributes under _atom_ which should be consulted for more info.
Example - these are exactly equivalent representations
An atomicBasisFunction.
An atomic atomicBasisFunction which can be linked to atoms, eigenvalues/vectors etc. Normally contained within _basisSet_
Normally these are atom-centered functions, but they can also serve as "ghost" functions which are centered on points. These can be dummy atoms so that the atomRef mechanism can still be used.
This information is required to interpret the eignevector components and map them onto the atom list. However this mapping is normally implicit in the program and so it may be necessary to generate
basisSet information for some programs before XML technology can be automatically used to link the components of the CCML document.
The atom owning this atomicBasisFunction. This reference is required to tie the reported eigenvector components to the list of atoms.
This is provided for completeness but we do not see it being widely used and the symbolic representation (lm) is more valuable.
This is a local annotation of the ABF and unlikely to be enumeratable. Thus a split s-orbital could have 3 ABFs with "s", "s'", "s''" but they would all have lm="s".
This is a "standard" representation of the ABF, but not enumerated until we decide whether it can be formalised. Examples are "px", "dxy", etc. Note that d-orbitals and higher may be represented with redundant ABFs, e.g. 6 d-orbitals. The more standard the representation, the more useful this will be for searching.
The stereochemistry round an atom centre.
It follows the convention of the MIF format, and uses 4 distinct atoms to define the chirality. These can be any atoms (though they are normally bonded to the current atom). There is no default order and the order is defined by the atoms in the atomRefs4 attribute. If there are only 3 ligands, the current atom should be included in the 4 atomRefs.
The value of the parity is a signed number. (It can only be zero if two or more atoms are coincident or the configuration is planar). The sign is the sign of the chiral volume created by the four atoms (a1, a2, a3, a4):
| 1 1 1 1 | | x1 x2 x3 x4 | | y1 y2 y3 y4 | | z1 z2 z3 z4 |
Note that
atomParity cannot be used with the *Array syntax for atoms.
A set of references to atoms.
An atomSet consists of a number of unique references to atoms throught their ids. atomSets need not be related to molecules (which are generally created by aggregation of explicit atoms). Two or more atomSets may reference the same atom, and atomSets may be empty.
atomSets have many potential uses such as:
identifying functional groups
results of substructure matching
identifying atoms with particular roles in a calculation
The atomSet may be referenced from elsewhere in the document and you are encouraged to use locally unique id attributes on atomSets.
An atomType.
atomTypes are used in a wide variety of ways in computational chemistry. They are normally labels added to existing atoms (or dummy atoms) in the molecule and have a number of defined properties. These properties are usually in addition to those deducible from the elementType of the atom. AtomTypes usually depend on the chemical or geometrical environment of the atom and are frequently assigned by algorithms with chemical perception. However they are often frequently set or "tweaked" by humans initiating a program run.
AtomTypes on an atom have no formal relation to its
elementType , which only describe the number of protons in the nucleus. It is not unknown (though potentially misleading) to use an "incompatible" atomType to alter the computational properties of an atom (e.g. pretend this K+ is a Ca++ to increase its effective charge).
atomTypes will also be required to describe pseudoAtoms such as "halogen" (generic) or "methyl group" (unified atom). Atoms in computations can therefore have an
atomType child with a "ref" attribute.
An atomType contains numeric or other quantities associated with it (charges, masses, use in force-fields, etc.) and also description of any perception algorithms (chemical and/or geometrical) which could be used to compute or constrain it. This is still experimental.
atomTypes are referred to by their mandatory
name attribute. An atom refers to one or more atomTypes through atomType/@ref children
examples not yet teste.
The name will usually be namespaced as 'gulp:si', 'tripos:c.3', etc. It must occur except for atomType/@re.
A container for one or more atomTypes.
It can contain several atomTypes.
A band or Brillouin zone.
Not yet finalised.
2006-01-21: PMR. added kpointRef and deprecated kpointList.
Band energies associated with this kpoint.
The energy units must be given.
kpoints should be described in kpointList and referenced.
A container for bands.
Experimental.
A container for one or more atomicBasisFunctions.
This can contain several orbitals.
A bond between atoms, or between atoms and bonds.
_bond_ is a child of _bondArray_ and contains bond information. Bond must refer to at least two atoms (normally using _atomRefs2_) but may also refer to more for multicentre bonds. Bond is often EMPTY but may contain _electron_, _length_ or _bondStereo_ elements.
Validate Bonds
Atom Refs for 2-atom bond
Are atoms distinct?
BOND (
): ATOMS not distinct:
Do both atoms exist in current molecule context?
BOND (
): ATOMREF not found:
BOND (
): ATOMREF not found:
One or more electrons associated with the bond.
The _bondRef_ on the _electron_ should point to the id on the bond. We may relax this later and allow reference by context.
The stereo convention for the bond.
only one convention allowed.
This is designed for multicentre bonds (as in delocalised systems or electron-deficient centres. The semantics are experimental at this stage. As an example, a B-H-B bond might be described as <bond atomRefs="b1 h2 b2"/.
This is designed for pi-bonds and other systems where formal valence bonds are not drawn to atoms. The semantics are experimental at this stage. As an example, a Pt-|| bond (as the Pt-ethene bond in Zeise's salt) might be described as <bond atomRefs="pt1" bondRefs="b32"/.
A user- or machine- assertion about the cyclic nature of a bond. Need NOT agree with the apparent cyclicity from the connection table.
A container for a number of bonds.
_bondArray_ is a child of _molecule_ and contains _bond_ information. There are two strategies:
Create individual
bond elements under
bondArray (in any order). This gives the greatest flexibility but is the most verbose.
Create
*Array attributes (e.g. of
orderArrayType under
bondArray . This requires all arrays to be of identical lengths with explicit values for all bonds in every array. This is NOT suitable for complexType bond children such as _bondStereo_ nor can IDs be added to bonds.. It also cannot be checked as easily by schema- and schematron validation. The _atomRef1Array_ and _atomRef2Array_ attributes are then mandatory. It is allowed (though not yet recommended) to add _*Array_ children such as _floatArray_
The attributes are directly related to the scalar attributes under _atom_ which should be consulted for more info.
Example - these are exactly equivalent representations
A set of references to bonds.
An bondSet consists of a number of unique references to bonds throught their ids. bondSets need not be related to molecules (which are generally created by aggregation of explicit bonds). Two or more bondSets may reference the same bond, and bondSets may be empty.
bondSets have many potential uses such as:
identifying functional groups
results of substructure matching
identifying bonds with particular roles in a calculation
The bondSet may be referenced from elsewhere in the document and you are encouraged to use locally unique id attributes on bondSets.
A container supporting cis trans wedge hatch and other stereochemistry.
An explict list of atomRefs must be given, or it must be a child of
bond . There are no implicit conventions such as E/Z. This will be extended to other types of stereochemistry.
At present the following are supported:
No atomRefs attribute.
Deprecated, but probably unavoidable . This must be a child of
bond where it picks up the two atomRefs in the
atomRefs2 attribute. Possible values are C/T (which only makes sense if there is exactly one ligand at each end of the bond) and W/H. The latter should be raplaced by
atomParity wherever possible. Note that W/H makes no sense without 2D atom coordinates.
atomRefs4 attribute . The 4 atoms represent a cis or trans configuration. This may or may not be a child of
bond ; if so the second and third atomRefs should be identical with the two atomRefs in the bond. This structure can be used to guide processors in processing stereochemistry and is recommended, since there is general agreement on the semantics. The semantics of
bondStereo not related to bonds is less clear (e.g. cumulenes, substituted ring nuclei) etc.It is currently an error to have more than one
bondStereo referring to the same ordered 4-atom list
atomRefs attribute . There are other stereochemical conventions such as cis/trans for metal complexes which require a variable number of reference atoms. This allows users to create their own - at present we do not see CML creating exhaustive tables. For example cis/trans square-planar complexes might require 4 (or 5) atoms for their definition, octahedral 6 or 7, etc. In principle this is very powerful and could supplement or replace the use of
cis- ,
mer- , etc.
the
atomRefs and
atomRefs4 attributes cannot be used simultaneously.
The type of a bond.
Bond types are used to describe the behaviour of bonds in forcefields, functional groups, reactions and many other domains. They are not as well formalised as atomTypes and we provide less semantic support. BondTypes are referred to by their mandatory _name_ attribute.
The bondType name. The name will usually be namespaced as 'gulp:si', 'tripos:c.3', etc. It must occur except when the ref attribute is given.
A container for one or more bondTypes.
_bondTypeList_ can contain several bondTypes.
A set of 3 cell parameters.
Either 3 lengths or 3 angles.
A general container for CML elements.
Often the root of the CML (sub)document. Has no explicit function but can serve to hold the dictionary and namespace and version information, and is a useful tag to alert CML processors and search/XMLQuery tools that there is chemistry in the document. Can contain any content, but usually a list of molecules and other CML components. The fileId attribute can be used to preserve the origin of the information, though metadat should also be used. Can be nested.
No specific restrictions..
An element to hold any combination of heterogeneous element children
complexObject can be used as it stands but will often be extended by schema definitions in dictionary entries.
A container for one or more experimental conditions.
This can contain several conditions. These include (but are not limited to) intensive physical properties (temperature, pressure, etc.), apparatus (test-tube, rotary evaporator, etc.). Actions can be represented elsewhere by actionList and solvents or other substances by substanceList.
A crystallographic cell.
Required if fractional coordinates are provided for a molecule. Originally there were precisely SIX child
scalar s to represent the cell lengths and angles in that order. There are no default values; the spacegroup is also included. This is now deprecated and replaced by cellParameter
2006-03-06 PMR: added cellParameter child
OLD STYLE: All 6 cell parameters must be given, even where angles are fixed by symmetry. The order is fixed as a,b,c,alpha,beta,gamma and software can neglect any title or dictRef attributes. Error estimates can be given if required. Any units can be used, but the defaults are Angstrom (10^-10 m) and degrees.
NEW STYLE: Two cellParameter children are given
The definition for an entry.
The definition should be a short nounal phrase defining the subject of the entry. Definitions should not include commentary, implementations, equations or formulae (unless the subject is one of these) or examples.
The definition can be in any markup language, but normally XHTML will be used, perhaps with links to other XML namespaces such as CML for chemistry.
From the IUPAC Dictionary of Medicinal Chemistry
Descriptive information.
This can occur in objects which require textual comment such as entry.
Entries should have at least one separate
definition s.
description is then used for most of the other information, including examples. The
class attribute has an uncontrolled vocabulary and can be used to clarify the purposes of the
description elements.
A dictionary.
A dictionary is a container for _entry_ elements. Dictionaries can also contain unit-related information. The dictRef attribute on a dictionary element sets a namespace-like prefix allowing the dictionary to be referenced from within the document. In general dictionaries are referenced from an element using the __dictRef__ attribute.
2005-12-15. PMR. added namespace and dictionaryPrefix.
A dimension supporting scientific unit.
This will be primarily used within the definition of units. Two dimensions are of the same type if their 'name' attributes are (case-sensitive) identical. Dimensions of the same typecan be algebraically combined using the 'power' attributes. Normally dimensions will be aggregated and cancelled algebraically, but the 'preserve' attribute can be used to prevent this. Thus a velocity gradient over length can be defined as:
whereas cancelling the dimensions would give:
Documentation in the annotation of an entry.
A container similar to
documentation in XML Schema. This is NOT part of the textual content of an entry but is designed to support the transformation of dictionary entrys into schemas for validation. This is experimental and should only be used for dictionaries, units, etc. One approach is to convert these into XML Schemas when the
documentation and
appinfo children will emerge in their correct position in the derived schema.
Do NOT confuse documentation with the description or the definition which are part of the content of the dictionary
If will probably only be used when there is significant appinfo in the entry or where the entry defines an XSD-like datatype of an element in the document.
An element to hold eigenstuff.
Holds an array of eigenvalues and a matrix of eigenvectors.
No current semantics.
Suggest it is developed for the chemical/physical role, e.g. "molecular obitals", "inertial matrix", "vibrational modes", "phonons", etc.
An electron.
Since there is very little use of electrons in current chemical information this is a fluid concept. I expect it to be used for electron counting, input and output of theochem operations, descriptions of orbitals, spin states, oxidation states, etc. Electrons can be associated with atoms, bonds and combinations of these. At present there is no hardcoded semantics. However, _atomRef_ and similar attributes can be used to associate electrons with atoms or bond.
A dictionary entry.
The original design for validation with attribute-based constraints is ponderous and fragile. In future constraints will be added through
appinfo in
annotation . We shall develop this further in the near future.
2003-03-30: added metadataList to content mode.
2007-01-20: added unitType.
2007-01-20: deprecated alternative, relatedEntry. These require approaches outside CMLSchema (e.g. RDF)
An enumeration of value.
An enumeration of string values. Used where a dictionary entry constrains the possible values in a document instance. The dataTypes (if any) must all be identical and are defined by the dataType of the containing element.
An expression that can be evaluated.
Experimental. This is essentially a mathematical function, expressed currently in reverse Polish notation but we expect to move to MathML.
CML-1 dataType DEPRECATED.
CML-1 dataType DEPRECATED.
A molecular formula.
It is defined by
atomArray s each with a list of elementTypes and their counts (or default=1). All other information in the
atomArray is ignored.
formula are nestable so that aggregates (e.g. hydrates, salts, etc.) can be described. CML does not require that formula information is consistent with (say) crystallographic information; this allows for experimental variance.
An alternative briefer representation is also available through the
concise . This must include whitespace round all elements and their counts, which must be explicit.
2005-10-16. The semantics are now the following. A formula must have one or both:
A concise attribute A single atomArray child, using array format.
it must also have a formalCharge attribute if atomArray is used and the charge is non-zero.
The concise, formalCharge and atomArrary information must always be consistent and software should throw an error if not.
Until now there was no way of holding inline formula other than concise (although JUMBO5.0 is capable of reading them). We now extend formula.xsd to incorporate this through the attribute "inline" which requires the use of the "convention" attribute. The contents of inline are purely textual. It can be used with or without atomArray or concise but there is no guarantee that it can be interpreted as a meaningful chemical formula or that there is consistency. In some cases a document supplies several formula representations (e.g. the IUCr's CIF). In this case a molecule (or crystal) element might contain several formula children. The semantics of which to use are application dependent.
Allows for fractional components.
The charge on the formula. Mandatory if non-zero (i.e. cannot rely on concise)
An inline representation of the formula. There are no controlled semantics and it need not be compatible with concise or atomArray.
A container for a fragment
fragment is a container for a molecule, potentially to be joined to other fragments. In addition there may be fragmentLists which represent branches from the molecule. There may also be a join child which is normally only found if there is a @countExpression.
2006-11-23: created
fragment normally contains molecules
branches from the moelcule.
the inter-fragment join.
Normally it only makes sense with @countExpression.
No formal semantics (yet).
A container for one or more fragments and joins.
fragmentList can contain several fragments and joins. The normal content model is
join fragment join fragment...
2006-07-20: PMR Added
2007-01-03: PMR Added role attribute
A gradient.
A container for a quantity or quantities representing the gradient of other quantities. At present just takes a scalar child.
A structured identifier.
Supports compund identifiers such as IChI. At present uses the V0.9 IChI XML representation verbatim but will almost certainly change with future IChIs.
The inclusion of elements from other namespaces causes problems with validation. The content model is deliberately LAX but the actual elements in IChI will fail the validation as they are not declared in CML. For simple scalar values the value attribute can be used with empty content. Where an identifier has several components a series of label elements can be used.
2003-07-10: Fixed count on identifier children..
2003-03-12: Added isotopic and atoms..
CML-1 dataType DEPRECATED.
CML-1 dataType DEPRECATED.
A specific isotope.
Defines an isotope in terms of exact mass and spin. Differentiate from isotopeList which defines a mixture of isotope.
A container for one or more isotopes.
Can contain several isotopes. These may be related in several ways. This allows the definition of natural abundance and averged enrichment.
Command to join two groups.
EXPERIMENTAL. join will normally use atomRefs2 to identify 2 R atoms (i.e. elementType="R" that should be joined. The atoms to which the R atoms are attached are then joined by a new bond and the R groups are then deleted. It is currently an error if these atoms already have a connecting bond.
2006-05-20: PMR added.
2006-11-24: PMR deleted @left, @linkOnParent, @right, @repeat.
2006-11-24: PMR modified content model
2006-11-24: PMR added @moleculeRefs2
A kpoint.
Not yet finalised.
2006-01-21: PMR. Created
A container for kpoints.
Experimental.
A text string qualifying an object.
A label can be used to identify or distinguish elements, add keywords or classifications and similar processes. It is usually interpretable by domain-aware humans (e.g. C3'-endo, but not a34561). It is usually either built in a semantically rich fashion (e.g. C2'-alpha-H) or belongs to a controlled vocabulary. It is possibly accessed by software in a domain-specific manner. It differs from
description which is free text. The distinction between titles, names and labels is fuzzy, but we think this is worth making. Labels may be necesssary to identify objects within programs, while names are more likely to be reserved for database searches. Titles are likely to be freer text and not recommended for precise object retrieval.
Labels should not contain whitespace. Punctuation marks are often necessary, but should not be gratuitously used. Punctuation clashing with XML character entities should be avoided; if this is not possible it should be escaped.
From IUPAC Dictionary of Medicinal Chemistry
A lattice of dimension 3 or less.
Lattice is a general approach to describing periodic systems. It can have variable dimensionality or periodicity, and could be finite.
_lattice_ is more general than _crystal_ in cmlCore which is used primarily for reporting crystallographic experiments.`A lattice can be described by latticeVectors, cell axes and angles, or metric tensors, etc. (only axes/angles are allowed under
crystal ). The dimensionality is enforced through a _system_ parent element.
All appropriate cell parameters must be given, even where angles are fixed by symmetry. The order is fixed as a,b,c,alpha,beta,gamma and software can neglect any title or dictRef attributes. Error estimates can be given if required. Any units can be used, but the defaults are Angstrom (10^-10 m) and degrees. To be developed for lower dimensionality.
A vector3 representing a lattice axis.
a
lattice can be represented by 1-3 non-linearly dependent latticeVectors. If the dimensionality is less than 3 latticeVectors are the preferred method. Similarly, if the axes show a mixture of periodicity and non-periodicity latticeVectors can support this. The number of periodic vectors must correspond with the periodicity attribute on a
system element.
The vector must not be zero and units must be given. (Zero vectors must not be used to reduce dimensionality).
A lattice vector defaults to periodic. .
Any or all of the axes may be periodic or aperiodic. An example could be a surface where 2 periodic axes (not necessarily orthogonal) are used to describe the coordinates in the surface, perhaps representing lattice vectors of a 3D crystal or 2D layer. The third vector is orthogonal and represents coordinates normal to the surface. In this case only the direction, not the magnitude of the vector is important.
A length between two atoms.
This is either an experimental measurement or used to build up internal coordinates (as in a z-matrix) (only one allowed). We expect to move length as a child of _molecule_ and remove it from here.
A line in 3-space.
A line characterised by one or two endpoints.
2006-01-02: the 6-number content has caused much confusion and will be obsoleted in favour of the point3 and vector3 attributes
An internal or external link to other objects.
Semantics are similar to XLink, but simpler and only a subset is implemented. This is intended to make the instances easy to create and read, and software relatively easy to implement. The architecture is:
A single element (
link ) used for all linking purposes.
The link types are determined by the
type attribute and can be:
.
locator . This points to a single target and must carry either a
ref or
href attribute.
locator links are usually children of an extended link.
arc . This is a 1:1 link with both ends (
from and
to ) defined.
extended . This is usually a parent of several locator links and serves to create a grouping of link ends (i.e. a list of references in documents).
Many-many links can be built up from arcs linking extended elements
All links can have optional
role attributes. The semantics of this are not defined; you are encouraged to use a URI as described in the XLink specification.
There are two address spaces:
The
href attribute on locators behaves in the same way as
href in HTML and is of type
xsd:anyURI . Its primary use is to use XPointer to reference elements outside the document.
The
ref attribute on locators and the
from and
to attributes on
arc s refer to IDs (
without the '#' syntax).
Note: several other specific linking mechanisms are defined elsewhere in STM.
relatedEntry should be used in dictionaries, and
dictRef should be used to link to dictionaries. There are no required uses of
link in STMML but we have used it to map atoms, electrons and bonds in reactions in CML
Relation to XLink . At present (2002) we are not aware of generic XLink processors from which we would benefit, so the complete implementation brings little extra value. Among the simplifications from Xlink are:
type supports only
extended ,
locator and
arc
label is not supported and
id s are used as targets of links.
show and
actuate are not supported.
xlink:title is not supported (all STM elements can have a
title attribute).
xlink:role supports any string (i.e. does not have to be a namespaced resource). This mechanism can, of course, still be used and we shall promote it where STM benefits from it
The
to and
from attributes point to IDs rather than labels
The xlink namespace is not used
It is not intended to create independent linkbases, although some collections of links may have this property and stand outside the documents they link to
The type of the object/element in the 'from' attributes. Requires the objects referenced by the 'from' attributes to have a given elementType. Can be overridden by 'from' attributes in individual links. 2005-06-18: created
The type of the object/element in the 'to' attributes. Requires the objects referenced by the 'to' attributes to have a given elementType. Can be overridden by 'to' attributes in individual links. 2005-06-18: created
The set of ids in the base of the link. 2005-06-18: created
The set of ids in the target of the link. 2005-06-18: created
The id of the ancestral element of objects referenced by 'from' attributes. Provides a context for uniquifying the references in the 'from' attributes. Thus atoms referenced by ids should be unique within a given molecule and the id of this could be the 'fromContext'. 2005-06-18: created
The id of the ancestral element of objects referenced by 'to' attributes. Provides a context for uniquifying the references in the 'to' attributes. Thus atoms referenced by ids should be unique within a given molecule and the id of this could be the 'toContext'. 2005-06-18: created
The role of the link. Xlink adds semantics through a URI; we shall not be this strict. We shall not normally use this mechanism and use dictionaries instead.
The target of the (locator) link, outside the document.
A generic container with no implied semantics.
A generic container with no implied semantics. It just contains things and can have attributes which bind conventions to it. It could often act as the root element in an STM document.
A container for links
Usage is now standardized with map as the container and link as the individual links. The links are often effectively typed pointers to other parts of the document. The type can be set for all links by the 'fromType' and 'toType' attributes, either in the map, which then applied to all links by default, or in individual links, when it overrides the map setting. Since ids may not be unique within a document the refs can be given context with the 'fromRef' and 'toRef' attributes in the map element. If more than one context is used it may be better to use multiple maps. The role of map, and its relationship to RDF is still being developed.
Currently (2005) map has primarily been used to map atoms between reactants and products, but we also expect shortly to extend it to peak assignments and several otherr areas. A map consists of a number of links, which can be directional, relating two elements through their ids. Reference is through the mandatory 'to' and 'from' attributes which must point to existing id attributes on elements. The type of the dereferenced element can be specified in 'toType' and 'fromType' which, while redundant, is an aid to software and acts as a check on referential type integrity.
In principle any element can be linked to any other, with 1:1, 1:n, and n:m topology. We expect maps to be used for precise chemical concepts such as reactions, peak assignments, electron management, molecular superpositions, etc. and that these are supported by bespoke code. For other links, especially with complex topology, users should consider whether RDF may be more appropriate.
In some cases partial mapping is known (e.g. one set of atoms maps to another set), but the precise links are unknown. (This is not the same as n:m mapping where n*m precise links would be expected). In some cases there may be objects such as atomSets or peakGroups which could be linked to support this. Alternatively the 'fromSet' and 'toSet' attributes can be used to hold a list of ids. Thus from='a1 a2' to='b3 b4' might imply that there were two precise links (either {a1=>b3, a2=>b4} or {a1=>b4, a2=>b3}). This is most likely to be used in intermediate documents where more precise semantics can be added later. The ids must all refer to elements of the same type. Note that a 'to' link referencing a single atomSet (toType='atomSet') is not the same as a 'toSet' of toType='atom' with multiple atomIds. The first would require an 'atomSet' element in the document; the second would not. The precise semantics such as the order of ids are application-dependent. If the order is known in both the toSet and fromSet then individual links should be used rather than adding the burden of deconstruction on the implementer.
2005-06-18: added typing and role and updated docs.
2006-08-05: added ref attribute.
The type of the object/element in the 'from' attributes. Requires the objects referenced by the 'from' attributes to have a given elementType. Can be overridden by 'from' attributes in individual links. 2005-06-18: created
The type of the object/element in the 'to' attributes. Requires the objects referenced by the 'to' attributes to have a given elementType. Can be overridden by 'to' attributes in individual links. 2005-06-18: created
The id of the ancestral element of objects referenced by 'from' attributes. Provides a context for uniquifying the references in the 'from' attributes. Thus atoms referenced by ids should be unique within a given molecule and the id of this could be the 'fromContext'. 2005-06-18: created
The id of the ancestral element of objects referenced by 'to' attributes. Provides a context for uniquifying the references in the 'to' attributes. Thus atoms referenced by ids should be unique within a given molecule and the id of this could be the 'toContext'. 2005-06-18: created
The role of the map. Semantics are undefined, and can be used to provide a small semi-controlled vocabulary for identifying maps of different types. 2005-06-18: created
A rectangular matrix of any quantities.
By default
matrix represents a rectangular matrix of any quantities representable as XSD or STMML dataTypes. It consists of
rows*columns elements, where
columns is the fasting moving index. Assuming the elements are counted from 1 they are ordered
V[1,1],V[1,2],...V[1,columns],V[2,1],V[2,2],...V[2,columns], ...V[rows,1],V[rows,2],...V[rows,columns]
By default whitespace is used to separate matrix elements; see
array for details. There are NO characters or markup delimiting the end of rows; authors must be careful!. The
columns and
rows attributes have no default values; a row vector requires a
rows attribute of 1.
matrix also supports many types of square matrix, but at present we require all elements to be given, even if the matrix is symmetric, antisymmetric or banded diagonal. The
matrixType attribute allows software to validate and process the type of matrix.
The mechanism of a reaction.
In some cases this may be a simple textual description or reference within a controlled vocabulary. In others it may describe the complete progress of the reaction, including topological or cartesian movement of atoms, bonds and electrons and annotation with varying quantities (e.g. energies).
For named reaction mechanisms ("Diels-Alder", "ping-pong", "Claisen rearrangement", etc.) the
name element should be used. For classification (e.g. "hydrolysis"), the
label may be more appropriate.
In more detailed cases the mechanism refers to components of the
reaction element. Thus bond23 might be cleaved while bond19 is transformed (mapped) to bond99. The
mechanismComponent can be used to refer to components and add annotation. This is still experimental.
IUPAC Compendium of Chemical Terminology 2nd Edition (1997) describes a mechanism as:
A detailed description of the process leading from the reactants to the products of a reaction, including a characterization as complete as possible of the composition, structure, energy and other properties of reaction intermediates, products and transition states. An acceptable mechanism of a specified reaction (and there may be a number of such alternative mechanisms not excluded by the evidence) must be consistent with the reaction stoichiometry, the rate law and with all other available experimental data, such as the stereochemical course of the reaction. Inferences concerning the electronic motions which dynamically interconvert successive species along the reaction path (as represented by curved arrows, for example) are often included in the description of a mechanism. It should be noted that for many reactions all this information is not available and the suggested mechanism is based on incomplete experimental data. It is not appropriate to use the term mechanism to describe a statement of the probable sequence in a set of stepwise reactions. That should be referred to as a reaction sequence, and not a mechanism.
CMLReact provides reactionScheme and annotions to describe the reaction sequence and both it and
mechanism could co-occur within a reactionScheme container.
2006-02-28 PMR: changed content model to choice.
An information component within a reaction mechanism.
Information components can represent both physical constituents of the reaction or abstract concepts (types of bond cleavage, thermodynamics, etc.). There are several ways that components of the reaction can be annotated and/or quantified. One approach will be to refer to specific bonds and atoms through their ids and use mechanismComponent to describe their role, properties, etc. Another is to use mechanismComponent to identify types of bond formed/broken without reference to actual atoms and bonds (initially through the
name element). Yet another will be to include information on the reaction profile.
This is still experimental.
A general container for metadata.
A general container for metadata, including at least Dublin Core (DC) and CML-specific metadata
In its simple form each element provides a name and content in a similar fashion to the
meta element in HTML.
metadata may have simpleContent (i.e. a string for adding further information - this is not controlled).
A general container for metadata elements.
MetadataLists can have local roles (e.g. a bibliographic reference could be a single meteadatList with, say, 3-6 components). The role attribute is used in an uncontrolled manner for this. MetadataLists can also be nested, but metadata and metadataList children should not occur on the same level of the hierarchy.
A module in a calculation.
Many programs are based on discrete modules which produce chunks of output. There are also conceptual chunks such as initialisation, calculation and summary/final which often have finer submodules such as cycle, iteration, snapshot, etc. There is no controlled vocabulary but a typical structure is shown in the example. One of the challenges of CCML is to find communality between different programs and to use agreed abstractions for the modules.
The module can have a program-specific name through its title or dictRef (e.g. "MINIM", "l201") and a generic role ("dynamicsCalculation", "equilibration", etc.). In general role will be controlled by CCML.
A container for atoms, bonds and submolecules.
molecule is a container for atoms, bonds and submolecules along with properties such as crystal and non-builtin properties. It should either contain
molecule or *Array for atoms and bonds. A molecule can be empty (e.g. we just know its name, id, etc.)
"Molecule" need not represent a chemically meaningful molecule. It can contain atoms with bonds (as in the solid-sate) and it could simply carry a name (e.g. "taxol") without formal representation of the structure. It can contain "sub molecules", which are often discrete subcomponents (e.g. guest-host).
Molecule can contain a <list> element to contain data related to the molecule. Within this can be string/float/integer and other nested lists
Normally molecule will not contain fragment or fragmentList
Revised content model to allow any order of lengths, angles, torsions 2003-01-01..
Added role attribute 2003-03-19..
2006-05-21. PMR changed content model to (A|B|C...)*
2006-11-24. PMR removed @tail, @head, @countExpression, @repeat
The float|integer|string children are for compatibility with CML-1 and are deprecated. scalar|array|matrix should be used instead.
No formal semantics (yet). The role describes the purpose of the molecule element at this stage in the information. Examples can be "conformation", "dynamicsStep", "vibration", "valenceBondIsomer", etc. This attribute may be used by applications to determine how to present a set of molecule elements.
A container for one or more molecules.
moleculeList can contain several molecules. These may be related in many ways and there is are controlled semantics. However it should not be used for a molecule consisting of descendant molecules for which molecule should be used. A moleculeList can contain nested moleculeLists.
2006-07-20: PMR Added
metadataList contains
metadata .
list is for experimental and other data.
moleculeList normally contains
molecule s but we make provision for nested moleculeLists if required. The
molecule s can be a set of reference molecules which occur in the
molecule s and can be referenced. This makes the molecules more readable and normalizes data when molecules are used more than once.
A string identifying a object.
name is used for chemical names (formal and trivial) for molecules and also for identifiers such as CAS registry and RTECS. It can also be used for labelling atoms. It should be used in preference to the
title attribute because it is repeatable and can be linked to a dictionary.
Constraining patterns can be described in the dictionary and used to validate
name s.
An object which might occur in scientific data or narrative.
Deliberately vague. Thus an instrument might be built from sub component objects, or a program could be composed of smaller modules (objects).
object could be used to encapsulate graphical primitives (e.g. in reaction schemes, drawings of apparatus, etc.). Unrestricted content model.
An observation or occurrence.
A container for any events that need to be recorded, whether planned or not. They can include notes, measurements, conditions that may be referenced elsewhere, etc. There are no controlled semantics.
An operator within an expression.
Experimental. An operator acts on one or more arguments (at present the number is fixed by the type). The formulation is reverse Polish so the result (with its dataType) is put on a stack for further use.
A parameter describing the computation.
A parameter is a broad concept and can describe numeric quantities, objects, keywords, etc. The distinction between keywords and parameters is often fuzzy. ("MINIM" might mean "minimize", while "MINIM=3" might require three iterations to be run. It may help to think of control keywords as boolean parameters.
Numeric parameters can describe values in molecules, forcefields or other objects. Often the parameters will be refined or otherwise varied during the calculation. Some parameters may be fixed at particular values or relaxed at different stages in the calculation. Parameters can have errors, gradients and other indications of uncertainty.
String/character parameters are often abbreviated in program input, and this is supported through the
regex and
ignoreCase attributes. ?????
Parameters will usually be defined separately from the objects and use the
ref attribute to reference them.
Parameters can be used to describe additional constraints. This will probably require the development of a microlanguage and until then may use program-specific mechanisms. A common approach will be to use an array of values (or objects) to represent different input values for (parts of) the calculation. Thus a conformational change could be specified by an array of several torsion angles.
A parameter will frequently have a
dictRef pointing to a dictionary which may have more information about how the parameter is to be used or the values it can take.
The allowable content of
parameter s may be shown by a "template" in the
appinfo ; this is stil experimental.
This is a shorthand for a single scalar value of the parameter. It should only be used with the
ref attribute as it inherits all the dataTyping of the referenced element. It must not be used for defining new parameters as it has no mechanism for units and dataTyping. [This may change?].
Used to define concepts such as independent and dependent variables
A container for one or more parameters.
parameterList can contain several parameters.
2006-02-16:PMR. Added parameterList as child
An object in space carrying a set of properties.
particles have many of the characteristics of
atom s but without an atomic nucleus. It does not have an elementType and cannot be involved in bonding, etc. It has coordinates, may carry charge and might have a mass. It represents some aspect of a computational model and should not be used for purely geometrical concepts such as centroid. Examples of particles are "shells" (e.g. in GULP) which are linked to atoms for modelling polarizability or lonepairs and approximations to multipoles. Properties such as charge, mass should be scalar/array/matrix children.
Used in a similar manner to
atomType . Examples might be "lonePair", "polarizable Oxygen", etc.
A peak; annotated by human or machine.
A
peak can describe:
A single point in a spectrum. Usually a maximum but could be a shoulder, inflexion or indeed any point of interest.
A continuous range of values within a spectrum, defined by maximum and minimum values on either/both axes
The finer structure of the peak can be given with one or more peakStructure children
The units should always be given. (The raw spectral data may unfortunately use different units and no assumptions should be made).
The content model includes atom, bond, molecule, but these are deprecated and should be replaced by atomRefs, etc.
2005-11-22: PMR. Added moleculeRefs
Allows
inter alia the provenance of the peak assignment to be recorde.
2005-11-9. DEPRECATED; use atomRefs
2005-11-9. DEPRECATED; use bondRefs
2005-11-9. DEPRECATED; use moleculeRefs when developed
2005-11-9. PMR, added
Atoms contributing to this peak
The primary set of atoms responsible for the peak such as an NMR peak. Coupling constants and similar splitting should not use this but peakStructure. At present there is no substructure to this attribute or concept and only one attribute is allowed. It may be combined with bondRefs. Even single atoms should use atomRefs, not atomRef.
Bonds contributing to this peak
The primary set of bonds responsible for the peak such as an IR frequency. At present there is no substructure to this attribute or concept and only one attribute is allowed. It may be combined with atomRefs.
Molecule(s) contributing to this peak
The molecule or molecule responsible for the peak. At present there is no substructure to this attribute or concept and only one attribute is allowed. This might, for example, be used to manage a mass spectrum or chromatogram
A list of closely related peaks or peakGroups.
Distinguish between
peakList (primarily a navigational container) and
peakGroup where the peaks (or groups) have some close relation not shared by all peaks. All descendants must use consistent units.
2005-11-22. added atomRefs, bondRefs and moleculeRefs and deprecated atom, bond, molecule children
Allows
inter alia the provenance of the peak assignment to be recorde.
2005-11-22. DEPRECATED; use atomRefs
2005-11-22. DEPRECATED; use bondRefs
2005-11-22. DEPRECATED; use moleculeRefs
Atoms contributing to this peak
The primary set of atoms responsible for the peak such as an NMR peak. Coupling constants and similar splitting should not use this but peakStructure. At present there is no substructure to this attribute or concept and only one attribute is allowed. It may be combined with bondRefs. Even single atoms should use atomRefs, not atomRef.
Bonds contributing to this peak
The primary set of bonds responsible for the peak such as an IR frequency. At present there is no substructure to this attribute or concept and only one attribute is allowed. It may be combined with atomRefs.
Molecule(s) contributing to this peak
The molecule or molecule responsible for the peak. At present there is no substructure to this attribute or concept and only one attribute is allowed. This might, for example, be used to manage a mass spectrum or chromatogram
A list of peaks or peakGroups.
Distinguish between
peakList (primarily a navigational container) and
peakGroup where the peaks (or groups) have some close relation not shared by all peaks. All peaks and peakGroups should use the same units.
Allows
inter alia the provenance of the peak assignment to be recorde.
The structure of a peak.
Primarily to record couplings and other fine structure. At present we have tested this on HNMR spectra, C13 NMR and simple IR. We believe that other types of spectroscopy (ESR, NQR, etc) can be represented to some extent, but there may be systems beyond the current expressive power.
For molecules without symmetry we believe that most of the important types of NMR coupling can be represented. Thus an atom which gives rise to two couplings can have two child PeakStructures, and this is shown in example1.
<cml xmlns="http://www.xml-cml.org/schema"> <!-- Ha ... Hb ... Hc1, Hc2 --> <molecule id="m1"> <atomArray> <atom id="a1" elementType="H"> <label value="Ha"/> </atom> <atom id="a2" elementType="H"> <label value="Hb"/> </atom> <atom id="a3" elementType="H"> <label value="Hc1"/> </atom> <atom id="a4" elementType="H"> <label value="Hc2"/> </atom> </atomArray> </molecule> <spectrum id="spectrum2" title="test peaks"> <peakList> <peak id="p1" title="Ha" atomRefs="a1" peakShape="sharp" xUnits="unit:ppm" xValue="6.0"> <peakStructure type="coupling" peakMultiplicity="doublet11" value="12" units="unit:hertz" atomRefs="a2"/> </peak> <peak id="p2" title="Hb" atomRefs="a2" peakShape="sharp" xUnits="unit:ppm" xValue="7.0"> <peakStructure type="coupling" peakMultiplicity="doublet11" value="12" units="unit:hertz" atomRefs="a1"/> <peakStructure type="coupling" peakMultiplicity="triplet121" value="15" units="unit:hertz" atomRefs="a3 a4"/> </peak> <peak id="p3" title="Hc" atomRefs="a3 a4" peakShape="sharp" xUnits="unit:ppm" xValue="8.0"> <peakStructure type="coupling" peakMultiplicity="doublet11" value="15" units="unit:hertz" atomRefs="a2"/> </peak> </peakList> </spectrum> </cml> Where a peak is due to symmetry-related atoms there are different couplings to symmetrical atoms. Thus in an AA'BB' system there can be two couplings to the A atoms and we need nested peakStructures to represent these. In this case the order of the atoms in the peak@atomRefs maps to the order of the grandchildren. See example2.
<!-- AA'BB' where there are 2 Ha and 2 Hb with two couplings J1 Ha ... Hb and Ha' ... Hb' J2 Ha ... Hb' and Ha' ... Hb --> <molecule id="m1"> <atomArray> <atom id="a1" elementType="H"> <label value="Ha"/> </atom> <atom id="a2" elementType="H"> <label value="Ha'"/> </atom> <atom id="a3" elementType="H"> <label value="Hb"/> </atom> <atom id="a4" elementType="H"> <label value="Hb'"/> </atom> </atomArray> </molecule> <spectrum id="spectrum2" title="test peaks"> <peakList> <!-- the ORDER of a1 and a2 is linked to the ORDER of the grandchildren elements, i.e. a1 couples to atoms in ps11 and ps21 while a2 relates to atoms is ps21 and ps22 --> <peak id="p1" title="Ha" atomRefs="a1, a2" peakShape="sharp" xUnits="unit:ppm" xValue="6.0"> <peakStructure id="ps1" type="coupling" peakMultiplicity="doublet" value="10" units="unit:hertz"> <peakStructure id="ps11" atomRefs="a3"/> <peakStructure id="ps12" atomRefs="a4"/> </peakStructure> <peakStructure id="ps2" type="coupling" peakMultiplicity="doublet" value="2" units="unit:hertz"> <peakStructure id="ps21" atomRefs="a4"/> <peakStructure id="ps22" atomRefs="a3"/> </peakStructure> </peak> </peakList> </spectrum> </cml>
Allows
inter alia the provenance of the peakStructure assignment to be recorded.
Allows identification of couplings in symmetric systems. May also be usable for other complicated systems.
The atoms to which the peakStructure refers.
Allows identification of the atoms to which the peak is coupled (not the atoms contributing to the primnary reference for which
peak should be used). It may be combined with bondRefs. Even single atoms should use atomRefs, not atomRef.
Bonds contributing to this peakStructure
Even a single bond should use bondRefs, not bondRef
A plane in 3-space.
An oriented plane of indefinite extent.
A point in 3-space.
An explicit potential.
This represents the actual function for the potential (i.e. with explicit values) rather than the functional form, which will normally be referenced from this.
The functional form of a potential.
This has generic arguments and parameters rather than explicit ones. It is essentially a mathematical function, expressed currently in reverse Polish notation.
A container for explicit potentials.
Experimental.
A product within a productList.
product describes a product species which is produced in a reaction. See
reactant for discussion of catalysis and solvents.
A product will normally be identified by name(s), formula, or molecule and at least one of these should normally be given. Amount(s) of product can be given after this identification and can describe mass, volume, percent yield, etc. but not stoichiometry
A container for one or more products.
productList can contain several products. These may be related in several ways, including
single list of products
grouping of products of parallel reactions
. A productList can contain nested productLists. The semantics of this are currently undefined.
The number of copies of the productList involved in the stoichiometric reaction. Probably not useful for simple reactions but could be used for parallel reactions.
A container for a property.
property can contain one or more children, usually
scalar ,
array or
matrix . The
dictRef attribute is required, even if there is a single scalar child with the same dictRef. The property may have a different dictRef from the child, thus providing an extension mechanism.
Properties may have a
state attribute to distinguish the state of matter
Semantics are not yet controlled but could include thermochemistry, kinetics or other common properties.
A container for one or more properties.
propertyList can contain several properties. These include (but are not limited to) observations, or numeric quantities.
A reactant within a reactantList.
reactant describes a reactant species which takes part in a reaction. Catalysts and supports are not normally classified as reactants, but this is subjective. Enzymes (or parts of enzymes) may well be reactants, as could be substances which underwent chemical change but were restored to their original state.
reactant is a powerful concept as it can support stoichiometry (atom and molecule counting), mapping (for mechanisms), etc. Solvents are best contained within substanceList.
A reactant will normally be identified by name(s), formula, or molecule and at least one of these should normally be given. Amount(s) of reactant can be given after this identification and can describe mass, volume, etc. but not stoichiometr.
The role of the reactant within a reactantList. Semantics are not yet controlled but could be limiting, oxidant, etc. TODO: a reactant might have multiple roles so this may have to become an element.
The number of copies of the reactant involved in the stoichiometric reaction. Could be non-integer but should not be used for actual ratios of materials added (for which amount should be used).
A container for one or more reactants.
reactantList can contain several reactants. These may be related in several ways, including
lists of related reactants
reactant schemes
multi-step reactants
parallel and/or coupled reactants
. A reactantList can contain nested reactantLists. The semantics of this are currently undefined.
A chemical reaction or reaction step.
reaction is a container for reactants, products, conditions, properties and possibly other information relating to the reaction, often within a reactionList. Partial semantics exist:
name the name(s) of the reaction
reactantList (normally only one) the grouped reactants
spectatorList substances with well-defined chemistry which are involved in the reaction but do not change. Examples are side groups in proteins, cofactors, etc. The division between specattor and substance is subjective.
substance or
substanceList substances present in the reaction but not classified as reactants. Examples might be enzymes, catalysts, solvents, supports, workup, etc.
condition conditions of the reaction. These may be text strings, but ideally will have clearer semantics such as scalars for temperature, etc.
productList the grouped products. This allows for parallel reactions or other semantics.
property properties (often physical) associated with the reaction. Examples might be heat of formation, kinetics or equilibrium constant.
Reaction normally refers to an overall reaction or a step within a reactionList. For a complex "reaction", such as in enzymes or chain reactions, it may be best to use
reactionScheme to hold the overall
reaction and a
reactionList of the individual
reaction steps.
The semantics of the content model are
metadataList for general metadata
label for classifying or describing the reaction (e.g. "hydrolysis")
identifier for unique identification. This could be a classification such as EC (enzyme commission) or an IChI-like string generated from the components.
these are followed by the possible components of the reaction and/or a reactionList of further details.
.
This allows any objects to be attached to the reaction, but particularly graphical primitives such as lines, arrows, etc. These should be provided as elements where possible (e.g. SVG) and should have references to the chemical objects they interact with (i.e. not simply relying on geometry). Markers with IDs can be included as part of the graphics object and their ids linked to the chemical elements using
link .
The yield of the reaction. Note that this lies in the range 0-1.
A container for one or more reactions or reactionSchemes with no interrelations.
A reactionList aggregates reactions and reactionSchemes but implies no semantics. The most common uses are to create small collections of reactions (e.g. databases or publications).
A container for two or more related reactions and their relationships.
Where reactions are closely related (and often formally dependent on each other) they should be contained within the reactionStepList of a reactionScheme. The semantics which have informed this design include:
Steps within an organic synthesis.
Two or more individual (primitive) steps provding the detailed mechanism for an overall reaction.
Coupled or sequential reactions within biochemical pathways.
This design is general because "reaction" is used in several ways. A biochemical pathway (e.g. oxidation of glucose to CO2 and water) involves many coupled enzyme reactions proceeding both in parallel and in sequence. Each of these steps ("reactions" in their own right) is itself complex and can include several mechanistics steps which are themselves reactions with products, reactants, etc.
reactionScheme can therefore include reactionStepLists (with more reactionScheme children) which provide a more detailed view of the individual components.
Where a set of reactions are primitives...
The semantics of the content model are
metadataList for general metadata
label for classifying or describing the reaction (e.g. "hydrolysis")
identifier for unique identification. This could be a classification such as EC (enzyme commission) or an IChI-like string generated from the components.
these are followed by the possible components of the reaction and/or a reactionList of further details.
A child of reactionStepList and a container for reaction or reactionScheme.
reactionStep is always contained within reactionStepList and is designed to manage "sub-reactions" which have close relationships. These will often involve reactions which, taken together, describe a higher level reaction or reaction type. Examples are:
biochemical pathways
synthetic reaction schemes
multi-step reactions
parallel and/or coupled reactions
. A reactionStep normally contains a single reaction or reactionScheme. It can have attributes such as yield and ratio which can be used by the parent reactionStepList.
The
name applies to the overall schema of reactions.
label is for additional textual information and classification.
reactionStepList normally contains
reaction s but we make provision for nested reactionSchemes if required.
The yield of the reactionStep. Note that this lies in the range 0-1.
The ratio of this step to one or more sibling steps. Note that this lies in the range 0-1. It is meaningless to use this unless there are siblings, in which case it refers to the relative molar fluxes through each. The "percentage yields" will need to be transformed to this range. There is no requirement that the sum of fluxes through a group of siblings sum to 1.0, though they should not sum to more.
A container for one or more related reactionSteps.
reactionStepList is always contained within reactionScheme and is designed to manage "sub-reactions" which have close relationships. These will often involve reactions which, taken together, describe a higher level reaction or reaction type. Examples are:
biochemical pathways
synthetic reaction schemes
multi-step reactions
parallel and/or coupled reactions
. A reactionStepList contains reactionSteps (each of which contains reactions and/or reactionSchemes (e.g. where part of the process is known in greater detail)). It may not directly contain child reactionStepLists.
The child reactionSteps can have attributes such as yield and ratio which describe the relationship of the component steps.
Guidance on use:
reactionScheme describes a complex of reactions with metadata, one (or more) overall reactions and a reactionStepList with the overall component reactions.
reactionStepList aggregates and structures the individual subreactions.
reactionList is a container for reactions and reactionSchemes with no semantics (e.g. a book or database of selected reactions).
The
name applies to the overall schema of reactions.
label is for additional textual information and classification.
reactionStepList normally contains
reactionStep s.
The reactiveCentre in a reaction.
This describes the set(s) of bonds and atoms involved in the reaction. The semantics are flexible, but a common usage would be to create atomSet(s) and bondSet(s) mapping to groups which undergo changes.
A region of the system.
Under development. A subdivision of the system to which special protocols or properties may be attached. Typical regions could be defined by the presence of atoms belonging to an atomSet or geometrical boundaries.
A region element will not always contain other elements, but may have references from other elements. It may create a protocol, e.g. atoms within a region might be replaced by a continuum model or be subject to a field. Semantics yet to be determined.
Regions can be created by the unions of two or more regions. This allows a region to be built from a series of (say) spheres or boxes filling space.
An entry related in some way to a dictionary entry.
The range of relationships is not restricted but should include parents, aggregation, seeAlso and so on. DataCategories from ISO12620 can be referenced through the namespaced mechanism.
The related entry.
An analytical or spectral sample.
The
sample should contain information on what things were in the sample and their roles. It can include
molecule ,
substance and
substanceList . Typical rolos include solvent, mulling agents, salt disks, molecular supports, etc. but should not cover apparatus or conditions.
A molecular description.
A substance in the sample.
A list of substances in the sample.
An element to hold scalar data.
scalar holds scalar data under a single generic container. The semantics are usually resolved by linking to a dictionary.
scalar defaults to a scalar string but has attributes which affect the type.
scalar does not necessarily reflect a physical object (for which
object should be used). It may reflect a property of an object such as temperature, size, etc.
Note that normal Schema validation tools cannot validate the data type of
scalar (it is defined as
string ), but that a temporary schema can be constructed from the type and used for validation. Also the type can be contained in a dictionary and software could decide to retrieve this and use it for validation.
Alternative to units
Must be used in conjunction with unitType
2005-10-26: added
Alternative to units
Must be used in conjunction with unitType
2005-10-26: added
Alternative to units
Must be used in conjunction with multiplierToSI and/or constantToSI
2005-10-26: added
A spectator object in a reaction.
Objects are often present during a reaction which are not formally involved in bond breaking/formation and which are not modified during the reaction. They may be catalysts, but may also be objects which in some way constrain or help the reaction to take place (surfaces, micelles, groups in enzyme active sites, etc.). In some cases molecules present in a reaction mixture may act as spectators in steps in which they are not transformed.
No controlled vocabulary. Examples could be 'host', 'hydrophobic ligand', 'charge-stabilizer', etc..
A container for spectators in a reaction.
A spectrum and relevant data or metadata.
The
spectrum construct can hold
metadataList ,
sample (which can contain molecule),
conditionList (mainly for physical/chemical conditions, not instrumental),
spectrumData for the actual data and instrumental settings/procedure and
peakList for the assigned peaks. This approach puts the spectrum as the primary object of interest. It could also be possible to make
spectrum a child of
molecule (although a reference using
ref might be preferable).
A (complete) description of the thing to which the spectrum relates. May contain
molecule or
substanceList . Solvents, mulls, etc should be described here.
The conditions relating to the spectrum (complementary to substanceList.
A list of peaks. This may occur independently of the xaxis/yaxis data.
The molecule to which the spectrum refers.
Although this may also be contained in the
sample element it is useful to state it here. No default.
Data for the spectrum.
This is primarily to record the data in interchangeable format and machine and manufacturers settings and can include other MLs in this area (AniML, SpectroML, etc.). We recommend ASCII representations of data and this is the only format that CMLSpect implementers have to support, but we also allow for the carriage of JCAMP and other data (in ML wrappers such as AniML). All numeric data should carry units and dictionary references if possible to allow for semantic interoperability.
The x-axis/es, usually including the list of points at which data are recorded. Mandatory if y-axis data are given. Multiple x-axes are initially reserved for multiple scales rather than different measurements (for which an additional spectrum should be used).
The y-axis/es, usually including the list of points at which data are recorded. Mandatory if x-axis data are given. Multiple y-axes are initially reserved for multiple scales rather than different measurements (for which an additional spectrum should be used).
A container for one or more spectra.
spectrumList can contain several spectra. These may be related in several ways, including
lists of related spectra
bundle of common analytical spectra (NMR, IR, UV...)
repeat measurements
. A spectrumList can contain nested spectrumLists.
metadataList contains
metadata .
list is for experimental and other data.
spectrumList normally contains
spectrum s but we make provision for nested spectrumLists if required. The
molecule s can be a set of reference molecules which occur in the
spectrum s and can be referenced. This makes the spectrums more readable and normalizes data when molecules are used more than once.
A sphere in 3-space.
An element to hold stmml data.
stmml holds stmml data under a single generic container. Other namespaces may be present as children. No semantics implied.
CML-1 dataType (DEPRECATED).
CML-1 dataType DEPRECATED.
A chemical substance.
substance represents a
chemical substance which is deliberately very general. It can represent things that may or may not be molecules, can and cannot be stored in bottles and may or may not be microscopic. Solutions and mixtures can be described by _substanceList_s of substances. The
type attribute can be used to give qualitative information characterising the substance ("granular", "90%", etc.) and _role_ to describe the role in process ("desiccant", "support", etc.). There is currently no controlled vocabulary. Note that
reaction is likely to have more precise semantics. The amount of a substance is controlled by the optional _amount_ child.
Added property as a child 2002-12-29
role depends on context, and indicates some purpose associated with the substance. It might indicate 'catalyst', 'solvent', 'antoxidant', etc. but is not limited to any vocabulary.
A list of chemical substances.
Deliberately very general - see substance. substanceList is designed to manage solutions, mixtures, etc. and there is a small enumerated controlled vocabulary, but this can be extended through dictionaries.
substanceList can have an amount child. This can indicate the amount of a solution or mixture; this example describes 100 ml of 0.1M NaOH(aq). Although apparently longwinded it is precise and fully machine-interpretable
Added role attribute, 2003-03-12.
Molecular, crystallographic or other symmetry.
symmetry provides a label and/or symmetry operations for molecules or crystals. Point and spacegroups can be specified by strings, though these are not enumerated, because of variability in syntax (spaces, case-sensitivity, etc.), potential high symmetries (e.g. TMV disk is D17) and non-standard spacegroup settings. Provision is made for explicit symmetry operations through <matrix> child elements.
By default the axes of symmetry are defined by the symbol - thus C2v requires z to be the unique axis, while P21/c requires b/y. Spacegroups imply the semantics defined in International Tables for Crystallography, (Int Union for Cryst., Munksgaard). Point groups are also defined therein.
The element may also be used to give a label for the symmetry species (irreducible representation) such as "A1u" for a vibration or orbital.
The matrices should be 3x3 for point group operators and 3x4 for spacegroup operators. The use of crystallographic notation ("x,1/2+y,-z") is not supported - this would be <matrix>1 0 0 0.0 0 1 0 0.5 0 0 1 0.0<matrix>.
The default convention for point group symmetry is
Schoenflies and for spacegroups is "H-M". Other conventions (e.g. "Hall") must be specfied through the
convention attribute.
This element implies that the Cartesians or fractional coordinates in a molecule are oriented appropriately. In some cases it may be useful to specify the symmetry of an arbitarily oriented molecule and the <molecule> element has the attribute
symmetryOriented for this purpose.
It may be better to use transform3 to hold the symmetry as they have fixed shape and have better defined mathematical operators.
2005-11-03 PMR. Added transform3 as children.
The rotational symmetry number. Used for calculation of entropy, etc.
The complete system of components in a calculation.
There is no controlled vocabulary.
A rectangular table of any quantities.
By default
table represents a rectangular table of any simple quantities representable as XSD or CML dataTypes. There are three layouts, columnwise, rowwise and without markup. In all cases it is essential that the columns, whether explicit or otherwise, are homogeneous within the column. Also the metadata for each column must be given explicitly.
- columns: There is a single
arrayList child containing (homogeneous) child elements (
array or
list of size
rows data. This is the "normal" orientation of data tables but the table display could be transposed by XSLT transformation if required. Access is to columns, and thence to the data within them. DataTyping, delimiters, etc are delegated to the arrays or lists, which must all be of the same size.
- rows: with explicit
trows. The metadata is carried in a
theader element of size
cols. Within each trow the data are contained in tcells
- content: The metadata is carried in a
theader element of size
cols. data are contained in a single
tableContent with columns moving fastest. Within the content the data are whitespace (or delimiter) separated.
For verification it is recommended that tables carry
rows and
columns attributes. The type of the tables should also be carried in a
tableTypeattribute>
Validity contraints (XPath expression in table context)
type
@tableType
@rows
actual rowCount
@columns
actual columnCount
tableHeader
arrayList
tableRowList
tableContent
column based
columnBased
recommended
./arrayList/@size or arrayList/*[self::array or self::list]/@size
optional
./arrayList/@size or count(arrayList/*[self::array or self::list])
forbidden
required
forbidden
forbidden
row based
rowBased
recommended
./tableRowList/@size or count(tableRowList/tableRow)
recommended
count(tableHeader/tableHeaderCell) or count(tableRowList/tableRow/tableCell)
required
forbidden
required
forbidden
content based
contentBased
required
only by analysing tde table
recommended
count(tableHeader/tableHeaderCell)
required
forbidden
forbidden
required
A cell in a row of a table.
tableCell is a data container of the table and only occurs as a child of tableRow. Normally it contains simpleContent, but may also contain a single child element (which could itself have complex or mixed content). However tableCell should NOT directly contain multiple children of any sort or mixed content. (It is declared as mixed content here to allow either text or element content, but not both.). The metadata for tableCells must be declared in a tableHeader/tableHeaderCell system
Unmarked content of a table.
This only occurs as simpleContent or a tableContent elements. It contains table/@rows * table/@columns items arranged rowwise (i.e. columns is fastest moving). Metadata for columns must be defined in tableHeader. The items of the table are ASCII strings. They can be separated by whitespace or by a defined single character delimiter as in
array . The data must be rectangular and each implicit column must have consistent semantics. It can be used to hold CSV-like data (indeed CSV data can be directly entered as long as there are no quoted commas in which cas a different delimiter (or the safer tableRowList) should be used. Unlike tableRowList or arrayList (both of which can hold ASCII strings or XML elements, tableContent can only hold strings.
Header for a table.
Used for rowBased or contentBased tables when it is mandatory. Contains the metadata as tableHeaderCells which should match the (implicit) columns in number and semantic type. It is forbidden for arrayList tables as each array/list contains the metadata.
Metadata for a column of a table.
Only used when in rowBased or contentBased tables, and then as a direct child of tableHeader. There must be as many tableHeaderCells as there are implicit columns in tableRowList or tableContent. These cells carry the metadata and/or semantics for each column. These are similar to the attributes in
array but without the lsist of minValue, errors etc. However they can (and should) carry all the units metadata.
Alternative to units
Must be used in conjunction with unitType
2005-10-26: added
Alternative to units
Must be used in conjunction with unitType
2005-10-26: added
Alternative to units
Must be used in conjunction with multiplierToSI and/or constantToSI
2005-10-26: added
A row in a rowBased table.
A direct child of tableRowList containing tableCells. At present all tableRows in a tableRowList must have the same count of tableCells and their semantics must correspond to the tableHeader in the table. No cells can be omitted and there is no spanning of cells. There is no need for a size attribute as the count is simply
count(tableCell) .
List of rows in rowBased table.
Metadata for rows must be defined in tableHeader.
A torsion angle ("dihedral") between 4 distinct atoms.
The atoms need not be formally bonded. It can be used for:
Recording experimentally determined torsion angles (e.g. in a crystallographic paper).
Providing the torsion component for internal coordinates (e.g. z-matrix).
Note that the order of atoms is important.
2006-02-07: PMR. Fixed torsionAngleUnits
A transform in 3-space.
A 3-D transform. Conventionally a 4x4 matrix.
The transition state in a reaction.
This will normally contain a
molecule which in its 2D representation will have partial bonds. These are yet to be formalized for the
molecule element.
Although spectators may stabilise or otherwise interact with the transitionState they are not contained within it.
A
propertyList is provided to capture transitionState properties.
Still experimental.
A scientific unit.
A scientific unit. Units are of the following types:
SI Units. These may be one of the seven fundamental types (e.g. meter) or may be derived (e.g. joule). An SI unit is identifiable because it has no parentSI attribute and will have a unitType attribute. 2005-122-17 - this may be obsolete; PMR
nonSI Units. These will normally have a parent SI unit (e.g. calorie has joule as an SI parent).
Constructed units. These use a syntax of the form:
<unit id="g.s-1" name="gram per second" unitType="myUnitType:massPerTime"> <unit units="units:g" power="1"/> <unit units="siUnits:s" power="-1"/> </unit> This defines a new unit (g.s-1) which is composed from two existing units (units:g and siUnits:s) to create a new unit. The conversion to SI is computed from the two child units and may be added as a 'multiplierToSI' attribute. Only siUnits or units with 'multiplierToSI' can be used as child units; 'constantToSI cannot be used yet. If the new unit points to a unitType then the dimension can be checked. Thus if the published dimension of massPerTime does not agree with mass.length-1 an error is throwable. Alternatively a new unitType can be added as a child.
The relationship of a unit to its SI parent is potentially complex and inconsistencies may arise. The following are available:
parentSI. This points to the ID of a parent SI unit. If this ID is the same as the current unit the implication is that this is an SI unit.
isSI. a boolean indicating whether the current unit is SI.
multiplierToSI and constantToSI. If these are 1.0 and 0.0 (or missing) the implication is that this unit is SI. However this is fragile as units can be defined without these attributes and a unit could coincidentally have no numeric differences but not be an SI unit.
2003:04-09 Description or parentSI attribute enhanced.
2006:03-21 Added metadata and metadataList to content.
Child unit used to build new unit.
These children must have 'units' and 'power' attributes.
Child unitType describing type of new unit.
This can be added by the author (in which case they are responsible for checking consistency) or calculated by the software from the child units.
Reference to a unit.
This is used for the identification of child units when new units are composed from existing ones. Athough the syntax looks unusual it takes advantage of the tools for resolving units. See above for syntax.
Abbreviation for the unit.
This may be obsolete and symbol should be preferred.
Symbol for the unit.
This may be used for typographical display but NOT for identification as there is considerable variation in use.
2006-01-29: PMR. Added attribute.
Power of unit used to create new one.
Only allowed on child units
A container for several unit entries.
Usually forms the complete units dictionary (along with metadata). Note: this used to hold both units and unitTypes (though in separate files). This was unwieldy and unitTypeList has been created to hold unitTypes. Implementers are recommended to change any unitList/unitType to unitTypeList/unitType
2005-12-15. PMR. added namespace and dictionaryPrefix.
2005-12-17. PMR. added siNamespace .
2006-01-28. PMR. deprecated use for holding unitType.
2006-01-28: PMR. use unitTypeList.
2006-01-28: PMR. use unitTypeList.
Maps dictRef prefix to the location of a dictionary. This requires the prefix and the physical URI address to be contained within the same file. We can anticipate that better mechanisms will arise - perhaps through XMLCatalogs. At least it works at present.
The type of a scientific unit.
Mandatory for SI Units, optional for nonSI units since they should be able to obtain this from their parent. For complex derived units without parents it may be useful.
Used within a unitList
Distinguish carefully from
unitsType which is primarily used for attributes describing the units that elements carry
2006-02-06: PMR. Added preserve and symbol attributes.
A container for several unitType entries.
Usually forms the complete unitTypes dictionary (along with metadata). Note: unitTypes used to be held under unitList, but this was complicated to implement and unitTypeList makes a clean separation.
2006-01-28. PMR. created.
Maps dictRef prefix to the location of a dictionary. This requires the prefix and the physical URI address to be contained within the same file. We can anticipate that better mechanisms will arise - perhaps through XMLCatalogs. At least it works at present.
A vector in 3-space.
The vector may have magnitude but is not rooted on any points (use line3).
The x-axis.
A container for all information relating to the x-axis (including scales, offsets, etc.) and the data themselves (in an
array ). Note: AniML uses "xValues" so avoid confusion with this.
The x-data. These must match the y-data in number and order. There are tools to allow scaling and transformation (though unscaled data must be very carefully defined).
The y-axis.
A container for all information relating to the y-axis (including scales, offsets, etc.) and the data themselves (in an
array ).
The y-data. These must match the x-data in number and order. There are tools to allow scaling and transformation (though unscaled data must be very carefully defined).
A zMatrix.
A container for
length ,
angle and
torsion , which must be arranged in the conventional zMatrix format.
© 2015 - 2025 Weber Informatics LLC | Privacy Policy