hirdparty.forester.1.039.source-code.phyloxml.xsd Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of forester Show documentation
Show all versions of forester Show documentation
Applications and software libraries for evolutionary biology and comparative genomics research
The newest version!
phyloXML is an XML language to describe evolutionary trees and associated data. Version: 1.10.
License: dual-licensed under the LGPL or Ruby's License. Copyright (c) 2008-2011 Christian M Zmasek.
'phyloxml' is the name of the root element. Phyloxml contains an arbitrary number of
'phylogeny' elements (each representing one phylogeny) possibly followed by elements from other namespaces.
Element Phylogeny is used to represent a phylogeny. The required attribute 'rooted' is used
to indicate whether the phylogeny is rooted or not. The attribute 'rerootable' can be used to indicate that
the phylogeny is not allowed to be rooted differently (i.e. because it is associated with root dependent
data, such as gene duplications). The attribute 'type' can be used to indicate the type of phylogeny (i.e.
'gene tree'). It is recommended to use the attribute 'branch_length_unit' if the phylogeny has branch
lengths. Element clade is used in a recursive manner to describe the topology of a phylogenetic
tree.
Element Clade is used in a recursive manner to describe the topology of a phylogenetic tree.
The parent branch length of a clade can be described either with the 'branch_length' element or the
'branch_length' attribute (it is not recommended to use both at the same time, though). Usage of the
'branch_length' attribute allows for a less verbose description. Element 'confidence' is used to indicate
the support for a clade/parent branch. Element 'events' is used to describe such events as gene-duplications
at the root node/parent branch of a clade. Element 'width' is the branch width for this clade (including
parent branch). Both 'color' and 'width' elements apply for the whole clade unless overwritten in-sub
clades. Attribute 'id_source' is used to link other elements to a clade (on the xml-level).
Element Taxonomy is used to describe taxonomic information for a clade. Element 'code' is
intended to store UniProt/Swiss-Prot style organism codes (e.g. 'APLCA' for the California sea hare 'Aplysia
californica') or other styles of mnemonics (e.g. 'Aca'). Element 'authority' is used to keep the authority,
such as 'J. G. Cooper, 1863', associated with the 'scientific_name'. Element 'id' is used for a unique
identifier of a taxon (for example '6500' with 'ncbi_taxonomy' as 'provider' for the California sea hare).
Attribute 'id_source' is used to link other elements to a taxonomy (on the xml-level).
Element Sequence is used to represent a molecular sequence (Protein, DNA, RNA) associated
with a node. 'symbol' is a short (maximal 20 characters) symbol of the sequence (e.g. 'ACTM') whereas
'name' is used for the full name (e.g. 'muscle Actin'). 'gene_name' can be used when protein and gene names differ.
'location' is used for the location of a sequence on a genome/chromosome. The actual sequence can be stored with the
'mol_seq' element. Attribute 'type' is used to indicate the type of sequence ('dna', 'rna', or 'protein').
One intended use for 'id_ref' is to link a sequence to a taxonomy (via the taxonomy's 'id_source') in case
of multiple sequences and taxonomies per node.
Element 'mol_seq' is used to store molecular sequences. The 'is_aligned' attribute is used
to indicated that this molecular sequence is aligned with all other sequences in the same phylogeny for
which 'is aligned' is true as well (which, in most cases, means that gaps were introduced, and that all
sequences for which 'is aligned' is true must have the same length).
Element Accession is used to capture the local part in a sequence identifier (e.g. 'P17304'
in 'UniProtKB:P17304', in which case the 'source' attribute would be 'UniProtKB').
Used to store accessions to additional resources.
This is used describe the domain architecture of a protein. Attribute 'length' is the total
length of the protein
To represent an individual domain in a domain architecture. The name/unique identifier is
described via the 'id' attribute. 'confidence' can be used to store (i.e.) E-values.
Events at the root node of a clade (e.g. one gene duplication).
The names and/or counts of binary characters present, gained, and lost at the root of a
clade.
A literature reference for a clade. It is recommended to use the 'doi' attribute instead of
the free text 'desc' element whenever possible.
The annotation of a molecular sequence. It is recommended to annotate by using the optional
'ref' attribute (some examples of acceptable values for the ref attribute: 'GO:0008270',
'KEGG:Tetrachloroethene degradation', 'EC:1.1.1.1'). Optional element 'desc' allows for a free text
description. Optional element 'confidence' is used to state the type and value of support for a annotation.
Similarly, optional attribute 'evidence' is used to describe the evidence for a annotation as free text
(e.g. 'experimental'). Optional element 'property' allows for further, typed and referenced annotations from
external resources.
Property allows for typed and referenced properties from external resources to be attached
to 'Phylogeny', 'Clade', and 'Annotation'. The value of a property is its mixed (free text) content.
Attribute 'datatype' indicates the type of a property and is limited to xsd-datatypes (e.g. 'xsd:string',
'xsd:boolean', 'xsd:integer', 'xsd:decimal', 'xsd:float', 'xsd:double', 'xsd:date', 'xsd:anyURI'). Attribute
'applies_to' indicates the item to which a property applies to (e.g. 'node' for the parent node of a clade,
'parent_branch' for the parent branch of a clade). Attribute 'id_ref' allows to attached a property
specifically to one element (on the xml-level). Optional attribute 'unit' is used to indicate the unit of
the property. An example: <property datatype="xsd:integer" ref="NOAA:depth" applies_to="clade"
unit="METRIC:m"> 200 </property>
A uniform resource identifier. In general, this is expected to be an URL (for example, to
link to an image on a website, in which case the 'type' attribute might be 'image' and 'desc' might be
'image of a California sea hare').
A general purpose confidence element. For example this can be used to express the bootstrap
support value of a clade (in which case the 'type' attribute is 'bootstrap').
A general purpose identifier element. Allows to indicate the provider (or authority) of an
identifier.
The geographic distribution of the items of a clade (species, sequences), intended for
phylogeographic applications. The location can be described either by free text in the 'desc' element and/or
by the coordinates of one or more 'Points' (similar to the 'Point' element in Google's KML format) or by
'Polygons'.
The coordinates of a point with an optional altitude (used by element 'Distribution').
Required attributes are the 'geodetic_datum' used to indicate the geodetic datum (also called 'map datum',
for example Google's KML uses 'WGS84'). Attribute 'alt_unit' is the unit for the altitude (e.g. 'meter').
A polygon defined by a list of 'Points' (used by element 'Distribution').
A date associated with a clade/node. Its value can be numerical by using the 'value' element
and/or free text with the 'desc' element' (e.g. 'Silurian'). If a numerical value is used, it is recommended
to employ the 'unit' attribute to indicate the type of the numerical value (e.g. 'mya' for 'million years
ago'). The elements 'minimum' and 'maximum' are used the indicate a range/confidence
interval
This indicates the color of a clade when rendered (the color applies to the whole clade
unless overwritten by the color(s) of sub clades).
This is used to express a typed relationship between two sequences. For example it could be
used to describe an orthology (in which case attribute 'type' is 'orthology').
This is used to express a typed relationship between two clades. For example it could be
used to describe multiple parents of a clade.