.collect.collect-core.4.0.93.source-code.idml3.1.1.xsd Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of collect-core Show documentation
Show all versions of collect-core Show documentation
This module is used by Collect Desktop, Collect Mobile and Collect Earth. It manages the storage and update of surveys and records.
Inventory Data Modeling Language 3.0-rev1
March 2013
Inventory Data Modeling Language (IDML) is an XML implementation of the Inventory Data Metamodel (IDMM), providing a formal language for defining and describing inventory data models. It is an abstraction which enables the creation of software that is flexible and reuseable in a wider range of scenarios. The IDMM also allows tools to interoperate on the same field data, regardless of platform or implementation. For example, mobile or personal data recorder (PDR) software, web or desktop-based data management software and data modeling tools that are designed to work with the IDMM would all be able work together and exchange data, regardless of vendor or inventory-specific details or functionality.
A specific usage or instance of the IDMM is called an inventory data model (IDM). An inventory data model formally defines the variables recorded during an inventory, as well as metadata needed for transparent, consistent interpretation of collected data. This includes definition of each entity and attribute and their respective data quality checks, precision, units, code lists and spatial reference systems. Names and documentation for each of these elements may be specified in any number of languages. These may then be used to generate standard reports for use in field manuals, online reporting and/or software help systems, for example. This documentation is also essential for proper interpretation of primary data and results calculation.
Other features include versioning, which allow data models to evolve over time in a managed way. New elements may be added and old ones deprecated, however operations which would compromise data integraty and comparability are inherently prohibited. This also facilitates creation of easy-to-use tools for data and database management. Custom expressions are represented in the W3C standard XPath language. String patterns may also be specified using standard POSIX-like regular expressions.
© Copyright 2013 Open Foris
G. Miceli, M. Togna, S. Ricci, A. Pekkarinen
Release Notes
Changes from 3.0:
* Code list hierarchy levels no longer contain id attribute; their natural order determines hierarchy rank
* Languages in Survey
* Added attribute types (integer/real)
A unique identifier of objects in the schema. This allows for data consistency when renaming or moving nodes. Once assigned, these may not be changed.
The path to one or more attributes or entities using a simplified form of W3C-standard XML Path Language (XPath). See idml:expression for details.
A literal (constant) value used to specify default attribute values. May be boolean (true, false), numeric (12, 24.4) or text (ABC). Note that text values do not need to be quoted.
Expressions contain formulas and custom logic used for data checking, relevance features, default calculation and other features dependent on the values entered at run-time. A subset of the W3C-standard XML Path Language (XPath) Version 1.0 is used for this purpose.
Paths and expressions are evaluated relative to one level above the entity or attribute where they are defined. For example:
This defines a tree with numeric height and dbh attributes, and a check on the tree's dbh which flags an error if it not less than or equal to 8x the tree's height.
Standard XPath axes are not supported (".", "..", "ancestor", "child", "descendant", etc.). Instead, the following custom functions and special variable are provided to consistently resolve relative paths:
parent()
the parent of the entity or attribute where the expression is defined
$this
the value associated with the attribute where the expression is defined
node(id)
to be implemented in 3.0-rev2; resolves to the relative path of the nearest instance of nodes with the specified definition id. In future versions, this will replace parent() and $this
Additional helper functions:
blank(node)
true if the attribute at the specified node has no value (symbols, units and other properties are ignored)
idm:lookup(table, valcol, idcol1, idexpr1, idcol2, idexpr2, ...)
retrieves a coordinate from specified 'table' and 'column', matching value of each 'idexpr' with value in each 'idcol'. This is implementation dependent and assumes the presence of table-based data which may be referenced.
A short text or number used to distinguish between versions of the data model and related field forms (e.g. DEC12 or 1.0).
An single entry in a code list. In hierarchical lists, may also contain other sub-items.
The code (e.g. 100, XYZ) associated with an code list item. By default, codes within a given code list must be unique. Special behavior may be defined using the element.
A short, human-readable label for a particular code list item (e.g. "Other wooded land").
Human-readable text describing the code list item.
For hierarchical lists, sub-items are defined here. In this case, the element should be used to define two or more list levels.
Unique identifier of items in a code list. Once assigned, this may not be changed.
Specifies if additional information in the form of text may be provided when the code is used. The most common use of this is for codes indicating "Other" or any other code with the parenthetical "(please specify)". In these cases, the code may be associated with additional textual information qualifying the selection.
The version of the model in which a particular list item was introduced. The code list item is considered undefined before the specified version.
The version of the model from which a particular list item was no longer used. The code list item is considered undefined on and after the specified version.
Top level element representing an instance of a survey's data model.
Human-readable name of the survey in (e.g. Archenland NFI).
Globally unique uniform resource identifier for the survey model. This identifieris used to identify the data model when importing and updating metadata. To guarantee uniqueness, it is recommended that the URL of the responsible institution be used, appending a unique code (UUID) or the name of the survey (e.g. http://www.openforis.org/idm/idnfi).
Name of survey project cycle in human-readable text (e.g. 2010-2015).
Human-readable text describing the survey project.
ISO 639-2 code of a language used in this survey model. Each code consists of the two lower-case letters (e.g. "en", "es", "de").
Application-specific configuration and options.
A set of application-specific options, identified by type attribute.
Application-specific XML or text. Software applications may use this to store UI and form layout, options, etc.
Type of application options contained in this element.
Container element for information on data model versions. Versioning allows certain elements of the survey model to be added or removed without breaking data consistency. Versions may also be used to document the changes made to a survey during the course of its execution. This allows software and reports to display only values that apply to the point in time that data were collected.
Defines a single version of the data model. Usually there will be one version for each revision of the field forms or mobile computer software used to collect field data.
Human-readable name of the model version (e.g. Pilot, Main Phase, Extension).
Human-readable description of the model version.
Date from which data was collected using this version of the model.
Unique identifier of versions in the schema. Once assigned, this identifier must not be changed.
Unique system name for this version of the data model to be referenced from other model elements (e.g. pilot, 1.0, 2.0).
Definitions of code lists used by the survey.
Definition of a single code list used by the survey. Code lists may be flat (one level) or hierarchical (multiple levels). Additional information about the structure of the list are defined in the and elements.
Human-readable name of the code list being defined.
The type of label being defined (to be removed in future release)
Indicates that the label refers items in the list (e.g. Administrative unit). Note that for hierarchical code lists, additional per-level labels are defined in the element.
Indicated that the label refers to the list itself (e.g. Administrative units)
Application-specific attributes.
Human-readable description of the code list.
Additional settings on how code values are treated and interpreted.
For hierarhical code lists, defines at which level codes are considered unique.
Codes are unique across the entire coding scheme (default). The same code may not be used twice, even if under different parents in a hierarchical list. This implies that each code in a hierachy completely identifies its related item. For example, in a hierarchy of administrative units with two levels "Province" and "District", district code "002" may not be used under both province "A" and "B". Instead, the district codes "A002" and "B002" could be used, both of which fully identify the district in question.
Codes are unique across the item's parent (in hierarchical lists only). This implies that the same code may be used twice at different points in the hierarchy. In this case, the code alone is not enough to identify the item in question; all ancestors are be required to do so. In the above example, district code "002" would then be allowed below both province "A" and "B". However to identify the district in this example, but province and district codes would be required.
Must be defined for code lists that are hierarchical in nature.
Represents a level in the hierarchy, in order from highest to lowest (e.g. country, region, distr)
The human-readable label of the hierarchy level.
Human-readable text describing the details or significance of the hierarchy level.
A short, unique system name for the level. In relational database terms, this may be used as the name of the table containing the hierarchy level.
Container for all items in a code list.
Describes a single entry in a predefined code list.
Unique identifier of a code list in the schema. Once assigned, this may not be changed.
A short, unique system name for the code list. In relational database terms, this may be used as the name of the table containing the code list.
Used to link large code lists (e.g. cluster and plot ids or detailed lists of administrative units) to an external table. This is an implementation dependent feature, which typically will pull codes from a database table, using the names of the levels in the hierarchy as the names of the columns containing the codes. For example:
...will take codes for the first level from sampling_design.cluster, and codes for the second level from sampling_design.plot.
The version of the model in which a particular code list was introduced. The code list is considered undefined before the specified version.
The version of the model from which a particular code list was no longer used. The code list is considered undefined on and after the specified version.
Describes the units of measure used by the inventory. Units are always stored with the primary data in order to avoid misinterpretation and to facilitate conversion between units.
Describes a single unit of measure referred to by the inventory data.
The full name of the unit of measurement (e.g. kilometers, metres per hectare).
The standard abbreviation for the unit of measurement (e.g. km, m/ha)
Unique identifer of the unit of measurement in the schema. Once assigned, this may not be changed.
A short name for the unit of measurement.
The dimension to which the unit belongs (e.g. length, volume, time).
The conversion factor used to convert to other units in the same dimension.
Defines specific coordinate reference systems and their respective map projections used by the inventory. This information is required for correct interpretation of coordinate and for related geospatial calculations (e.g. distance).
Defines a single coordinate reference system used by the system. Note that when using Universal Transverse Mercator coordinate system (UTM), each zone is to be encoded as a separate spatial reference system.
Human-readable name of the spatial reference system (e.g. UTM Zone 38).
Human-readable description of the spatial reference system.
Definition of the spatial reference system's parameters and projection in well-known text (WKT) format. WKT is a standard defined in the Open Geospatial Consortium (OGC)'s Simple Feature Access and Coordinate Transformation Service specifications.
User-defined unique ID of the spatial reference system (e.g. EPSG:21036)
Defines information about the reference data, like the sampling points or the taxon list
Contains the definition of the structure and contents of primary data, including types of entities studied and measurements or observations that will be made. Quality checks to be applied during data collection and initial data processing are also defined here.
A schema may contain one or more root entities which are considered the main unit of work for a survey.
An entity represents an observation unit; a think or concept for which observations and measurements will be made. Some common examples include "plot", "tree" and "household". In IDM, entities may have any number of attributes and may contain one or more other entities (e.g. a "plot" contains many "tree" entities). Top-level entities are special in that the are the container for all other entities and attributes, and represent the main unit of data collection. For example, if socio-economic surveys are done as an integrated part of biophysical surveys, both "field_plot" and "household" may fall under one "cluster" entity. However, if the interviews and field measurements are done separately and are not directly related, "household" and "cluster" may both be better represented as top-level entities.
Contains the last id used for uniquely identifying objects in this survey model.
To avoid conflicts between instances and uses of the model, ids should not be
reused (i.e. lastId must increase each time a new object is added to the IDM)
If true, the survey is in use by some systems and cannot be entirely modified
A common base type for entities and attribute.
The human-readable label associated with an entity, entities, attribute value or values when displaying results, calculation and reporting.
Indicates that the label applies to a single instance of an entity (e.g. "Tree") or attribute (e.g. "Use").
Indicates that the label applies to multiple instances of an entity (e.g. "Trees") or attribute (e.g. "Uses"). Frequently used as a heading when displaying or printing lists of values or entities.
Indicates the number associated with the attribute or entity. This code commonly appears on field forms and PC or PDR software, and may be number (1, 2, 3) or alphanumeric (2a, IV).
Application-specific attributes imported from other namespaces.
The human-readable text used when eliciting a value or response from a user, field team, or interviewee.
Question, phrase or statement posed to an interviewee during interviews.
Text appearing next to the item associated with this entity or attribute on printed field forms.
Prompt shown on handheld computers when collecting data electronically in the field.
Prompt shown on PC-based software (e.g. Open Foris Collect.) when entering data in the field or office.
Application-specific attributes imported from other namespaces.
Human-readable description of addition details regarding the entity or attribute (e.g. plot size/shape, how the value is obtained). Used for documentation and software help systems.
Unique identifer of the entity or attribute in the schema. Once assigned, this may not be changed.
A short, unique system name for the entity or attribute. In relational database terms, this may be used as the name of the table containing the instances of an entity, or the prefix of the columns containing an attribute's values.
The version of the model in which a particular entity or attribute was introduced. The entity or attribute is considered undefined before the specified version.
The version of the model from which a particular entity or attribute was no longer used. The entity or attribute is considered undefined on and after the specified version.
Application-specific attributes imported from other namespaces.
Common base type for entities. An entity represents an observation unit; a think or concept for which observations and measurements will be made. Some common examples include "plot", "tree" and "household". In IDM, entities may have any number of attributes and may contain one or more other entities (e.g. a "plot" contains many "tree" entities).
An entity represents an observation unit; a think or concept for which observations and measurements will be made. Some common examples include "plot", "tree" and "household". In IDM, entities may have any number of attributes and may contain one or more other entities (e.g. a "plot" contains many "tree" entities).
A single real or integer number (2.3, 5.1). Units of measurement are also kept as a separate property of each value.
A range consists of either a single number (e.g. 12, 15.2, -12) or two numbers separated by a hyphen/dash (e.g. 4-12, 2-6.7) indicated a minimum and maximum value for the attribute. The first number must always be less than or equal to the second. Note that single values (e.g. 5) are treated as both 'from' and 'to' values (e.g. 5-5). As with other number attributes, units of measurement are also managed.
A Gregorian date consisting of a year, month and day.
A check that the date occurs before, on, or after another date.
A user-defined check on the date.
A time, consisting of the hour and minutes.
A check that the time occurs before, at, or after another time.
A user-defined check on the time.
An attribute whose value may be true, false or null/blank.
A custom check on the state of the attribute (true, false, blank)
Indicates that "false" or "no" values are not able to be specified explicitly by the user, but rather are explictly defined by the absence of a an affirmative action. Concretely, if paper or electronic forms present two checkboxes, e.g.:
Living [ ] Yes [x] No
...the user has three choices; to check "Yes", to check "No", or to check neither. Each of these actions corresponds to true, false and '', respectively. In this case, affirmativeOnly should be false, as the data may also contain negative values (false).
In cases where an attribute is represented by a single check box, e.g.:
[ ] Living
...the user has only two choices; to check "Yes", or to do nothing. These actions correspond to true and '', respectively. In this case, affirmativeOnly would be true, as there is no explicit way to specify a negative value.
Represents an image or file attachment.
Maximum file size expressed in number of bytes.
Space-separated list of allowed file extensions. Implementations may treat jpg or other extensions as special cases, presenting thumbnails or de-identifying metadata where appropriate.
Code attributes consist of one or more code values, with qualifying text when appropriate (e.g. Other (specify)).
If true, the values are limited to those in the code list. When false, unlisted values may be assigned to the attribute, indicated by a warning state on the attribute.
The name of the code list from which valid values for this attribute are defined.
If the code attribute refers to the second or lower level of a hierarchical code list, the parent must points to the code attribute containing the value of the parent level. The valid values for descendents will then depend on the value specified for the ancestor levels. For example, given the following code list:
...
...two attributes could then be defined, one for each level:
In this case, the meaning and validity of the district code would then depend on the region code specified in the 'region' code attribute.
Determines whether the code is to be included as part of the key which uniquely identifies the associated unit of observation within its parent. Keys are not assumed to be globally unique, but rather unique relative to their parent entity. For example, code 'plot_id' and text 'plot_section' may be used as keys for a 'plot' entity within a cluster, while 'cluster_code' may uniquely identify a 'cluster' top-level entity within a survey.
A coordinate consists of an X value, Y value, and spatial reference system identifier (SRS ID). The SRS defines the projection and other parameters necessary to correctly interpret and convert the X and Y values to other coordinate reference systems (latitude/longitude, UTM, etc.)
A taxon consists of a reference to a family, genus, species and/or sub-species by code, scientific name. Optionally, it may also contain a local or common vernacular name, qualified by the name's ISO 639-3 language code.
The name of the taxonomic checklist from which species codes and names are taken. Implementation independent.
Specifies the highest taxonomic rank allowed. For example, if genus is selected, the value may refer to a genus, species or subspecies, but not a family.
Implementation specified; a comma-separated list of expressions used to filter lookups in the checklist by one or more run-time values (e.g. to limit species by region).
An attribute containing free text.
Determines whether the text value is to be included as part of the key which uniquely identifies the associated unit of observation within its parent. Keys are not assumed to be globally unique, but rather unique relative to their parent entity. For example, code 'plot_id' and text 'plot_section' may be used as keys for a 'plot' entity within a cluster, while 'cluster_code' may uniquely identify a 'cluster' top-level entity within a survey.
The sub-type of the text attribute.
Short text consists of single line of upper-case text, numbers and symbols of up to 255 characters in length. Formatting is ignored, including leading and trailing spaces and carriage returns.
Memo fields contain any formatted text, up to 1024 characters in length. Leading and training whitespace is trimmed, but other formatting (capitalization, carriage returns, etc.) is maintained.
A conditional expression which determines whether the entity is relevant. For example, if stumps are only observed in permanent plots, the stump entity definition in plot may declare relevant="permanent". All validation errors on associated attributes are reduced to warnings when this expression evaluates to false.
A boolean value indicating that one or more instances of the entity is always required (deprecated, marked for removal).
A conditional expression which determines whether one or more of an entity is required (deprecated, marked for removal)
Indicates that more than one instance of an entity may be specified (deprecated, marked for removal)
The minimum number of instances of the entity which must be provided.
The maximum number of instances of the entity which may be provided.
Defines an inventory attribute. An attribute defines variable for which a value will be recorded, including all field measurements, observations and survey questions.
Defines the behavior when values are missing. When this is applied to field data is implementation dependent (e.g. Collect applies defaults when data is submitted for cleansing).
A constant numeric, text or range to be applied as the default value.
An expression that evaluates to the default numeric, text or range value.
A conditional expression indicating whether or not the default should be applied to a missing value. Default rules are to be evaluated in order until an applicable rule is found (i.e. this expression evaluates to true)
A conditional expression defining when the attribute is relevant. All validation errors on this are reduced to warnings when this expression evaluates to false.
Indicates whether or not a value is always required.
A conditional expression indicating when when an attribute value is required (deprecated, marked for merging with @required attribute)
Indicates whether multiple-responses are allowed for code attributes.
For multiple attributes, indicates the minimum number of values allowed (deprecated, marked for removal).
For multiple attributes, indicates the maximum number of values allowed (deprecated, marked for removal).
Indicates whether values are automatically calculated or user input is required to fill in values.
The short name of the unit of measurement.
The number of decimal digits to round to when displaying values in reports and query results.
Indicates whether this unit is the default unit of measurement for this attribute. If not specied, the first element will taken as the default. The default unit of measurement is intended to be used when exporting and importing data to other systems and formats.
Determines whether the number is to be included as part of the key which uniquely identifies the associated unit of observation within its parent. Keys are not assumed to be globally unique, but rather unique relative to their parent entity. For example, code 'plot_id' and text 'plot_section' may be used as keys for a 'plot' entity within a cluster, while 'cluster_code' may uniquely identify a 'cluster' top-level entity within a survey.
Specifies the sub-type of numeric attributes (number and range).
Real values may be positive or negative numbers with zero or more decimal digits (e.g. 12, -4, 6.3, -18.422).
Integer values may be positive or negative numbers with no decimal digits (e.g. 12, -4, 6, -18).
The base type for checks define rules executed on primary data during data entry and data cleansing.
A custom message to be used when a check fails. This is especially useful for custom checks, where the failed check is not clear from the default message.
With what severity to flag the value when the check fails.
The 'error' flag indicates that the value is invalid and must not be used for analysis.
The 'warn' flag indicates that the valid is suspicious, but may still be correct.
A conditional expression which determines whether or not the check will be applied. If the expression evaluates to 'false', the check should be completed ignored.
Asserts that a value is within a certain range.
An constant or expression which evaluates to the minimum allowed value (inclusive).
An constant or expression which evaluates to the maximum allowed value (inclusive).
An constant or expression which evaluates to the minimum allowed value (exclusive).
An constant or expression which evaluates to the maximum allowed value (exclusive).
An constant or expression which evaluates to the exact required value.
Asserts that a coordinate value is at least/most a certain distance from a known location. Coordinates are typically matched by one or more ids and retrieved from a lookup table. For example, the following coordinate checks:
...states that it is an error when the coordinate is more than 100km from the planned location, and a warning when it is more than 50km. The lookup() function retrieves the 'coordinate' column in the 'sampling_design' table, matching the 'cluster' column to the attribute 'id' of the entity where the coordinate is located, and the 'plot' column with the empty string ''.
A constant or expression that evaluates to the minimum allowed distance, in meters.
A constant or expression that evaluates to the maximum allowed distance, in meters.
An expression that evaluates to a coordinate from which the distance should be checked (default is $this).
An expression that evaluates to a coordinate to which the distance should be checked. This is usually a lookup function to retrieve the location from an implementation-dependent table (see example above).
Asserts that the value of a text or code attribute matches a regular expression pattern.
A regular expression which must match the value of the given code or text attribute. In computing, regular expressions (regex) are used to define patterns for matching text. IDM uses the POSIX-style regular expressions provided with Java Platform SE 6. See http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html for detailed reference.
Asserts that the expression specified in the test attribute is true.
Generates an error or warning when this expression evaluates to false.
Indicates that values in the context indicated by the check expression must be unique.
Evaluates to a list of nodes (nodeset) whose values must be unique.
A short name for use in various representations of data and metadata (e.g. XML element names, database table/column names). The name may only contain lowercase letters ("a"-"z"), numeric digits ("0"-"9") and underscores ("_"), and must start with a lowercase letter. Uppercase letters, spaces and special symbols are not allowed.
A single line of human-readable text. Leading and trailing spaces are ignored.
Two-letter ISO 639-2 language code (e.g. "en", "es", "de").
Human-readable text for use in documentation, user interfaces and interactive help.
Whitespace (spaces, returns, tabs, etc.) in the text is maintained.
Two-letter ISO 639-2 language code (e.g. "en", "es", "de").
© 2015 - 2024 Weber Informatics LLC | Privacy Policy