All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.opentripplanner.netex.package.md Maven / Gradle / Ivy

There is a newer version: 2.5.0
Show newest version
# NeTEx

NeTEx is a European standard for exchanging Transit data. OTP can import NeTEx into its internal 
model. The XML parser support the entire NeTEx specification and is not limited to a specific
profile, but not every part of it is mapped into OTP. Only a small subset of the entities are
supported. When loading NeTEx data OTP should print warnings for all NeTEx data types not loaded.

OTP is tested with data from Entur which uses the [Nordic NeTEx profile](https://enturas.atlassian.net/wiki/spaces/PUBLIC/pages/728891481/Nordic+NeTEx+Profile). If you find that some part of your import is not 
imported/supported by OTP you will need to add support for it in this model. NeTEx is huge, and
ONLY data relevant for travel planning should be imported. 

OTP assume the data is valid and as a main rule the data is not washed or improved inside OTP. Poor
data quality should be fixed BEFORE loading the data into OTP. OTP will try to _ignores_ invalid 
data, allowing the rest to be imported. 


## Design Goals

- Import Transit data from NeTEx xml-files
- Handle large input file sets (10 GB)
- Allow some data to be shared and group other data together is an isolated scope 
- Support for reading data fast, multi-threaded (the design support this, but not implemented jet)   
- Warn or report issues on poor data, but keep building a graph so one "bad" line do not block the 
  entire import.
- The import should put any restrictions on the order of XML types in the files. If ServiceJourney
  comes before Authority in the xml file - that should be ok. The file-hierarchy is an optional 
  way to group and scope data.  
 
 
## Design

The 2 main classes are the [`NetexModule`](NetexModule.java) and the [`NetexBundle`](NetexBundle.java).
The `NetexModule` is a `GraphBuilderModule` and responsible for building all bundles, while a bundle 
is responsible for importing a Netex bundle, normally a zip-file with a Netex data set. You may 
start OTP with as many bundles as you like, and you may mix GTFS and NeTEx bundles in the same build. 

![Design overview](DegignOverview.png)

The Netex files are _xml-files_ and one data set can be more than 5 GB in size. There is no fixed 
relationship between file names and content like it is in GTFS, where for example `stops.txt` 
contains all stops. Instead, OTP import Netex data based one a file hierarchy. 
 

### Netex File Bundle

As seen above the _netex-file-bundle_ is organized in a hierarchy. This is done to support 
loading large data set, and to avoid keeping XML DOM entities in memory. Also, the hierarchy 
prevent references from different files at the same level to reference each other. The hierarchy 
allow OTP to go through the steps of parsing xml data into Netex POJOs, validating the relationships 
and mapping these POJOs into OTPs internal data model for *each set/group of files*.  

The general rule is that entities referencing other entities, should be in the same file or placed
at a lover level in the hierarchy, so the referenced object already exist when mapping an entity. 
There are exception to this. For example trip-to-trip interchanges. 

The shared data si available during the entire mapping process. Then _group data_ is kept in memory 
for the duration of _parsing_ and _mapping_ each group. Data in one group is not visible to another
group.
 
Within each group there is also _shared-group-data_ and _group-files_ (leaf-files). 

- Entities in _group-files_ can reference other entities in the same file and entities in the 
  _shared-group-files_ and in the global _shared-files_, but not entities in other _group-files_.
- Entities in _shared-group-files_ can reference other entities in the same file and entities in 
  the same group of _shared-group-files_ and in the global _shared-files_, but not entities in any
   _group-files_.
- Entities in global _shared-files_ can reference other entities in the same file and entities in 
  other global _shared-files_.

✅ Note! You can configure how your data files are grouped into the 3 levels above using regular 
expressions in the _build-config.json_.


### Load entities, validate and map into the OTP model

For each level in the hierarchy and each group of files OTP perform the same steps:

 1. Load XML entities (NeTEx XML DOM POJOs). See [`NetexDataSourceHierarchy`](loader/NetexDataSourceHierarchy.java)
 1. Parse xml file and insert XML POJOs into the index. See [`NetexXmlParser`](loader/NetexXmlParser.java)
 1. Validate relationships. See [`Validator`](validation/Validator.java)
 1. Map XML entities to OPT internal model. See [`NetexMapper`](mapping/NetexMapper.java)

OTP load entities into a hierarchical [`NetexEntityDataIndex`](index/NetexEntityDataIndex.java) 
before validating and mapping each entity. Entities may appear in any order in the _xml-files_. So,
doing the validation in a separate step ensure all entities is available when doing the validation.
If an entity or a required relation is missing the validator should remove the invalid entity. 
This make the mapping easier, because the mapper can assume all required data and entities exist.

![Collaboration diagram](Colaboration.png)

Here is an outline of the process including the file-hierarchy traversal and the steps at each 
level:

1. Load _shared-data-files_ into _index_.
1. Validate loaded entities 
1. Map _shared-data-entries_
1. For each group:
    1. Load _group-shared-files_ into index
    1. Validate loaded entities 
    1. Map _group-shared-entries_
    1. For each leaf group-file file:
        1. Load _group-file_ into index
        1. Validate loaded entities 
        1. Map _group-entries_
        1. Clear leaf data from index
    1. Remove group data from index

The [`NetexBundele`](NetexBundle.java) repeat the exact same steps for each group/set of files.
To emulate navigation in the hierarchy both the [`NetexEntityDataIndex`](index/NetexEntityIndex.java) 
and the [`NetexMapper`](mapping/NetexMapper.java) persist data in a "Stack" like structure. The 
`NetexBundle` call the `push()` and `pop()` on the index and the mapper to enter and exit each 
file set at a given level. Entities loaded at a given level is in the local scope, while 
entities loaded at a higher level is in the global scope. The index has methods to access both
local and global scoped entities, but it is only possible to add entities at the local scope.


## Package dependencies

![Package dependencies](PackageDependencies.png)





© 2015 - 2024 Weber Informatics LLC | Privacy Policy