georocket-docs.1.0.0.source-code.index.adoc Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of georocket-docs
GeoRocket is a high-performance data store for geospatial files
There is a newer version: 1.3.0
= GeoRocket User Documentation
Michel Krämer 
v1.0.0
:toc: right
:homepage: http://georocket.io
:numbered:
:docinfo1:
:icons: font
:source-highlighter: highlight.js

Copyright (C) 2015-2017 Fraunhofer Institute for Computer Graphics Research IGD

== Introduction

GeoRocket is a high-performance data store for geospatial files. It can store 3D city models (e.g. CityGML), GML files or any other XML-based geospatial data sets. GeoRocket provides the following features:

* High performance data storage with multiple back-ends such as Amazon S3, MongoDB, distributed file systems (e.g. HDFS or Ceph), or your local hard drive (enabled by default)
* Support for high-speed search features based on the popular Open-Source framework https://www.elastic.co/[Elasticsearch]. You can perform spatial queries and search for attributes, layers and tags.
* GeoRocket is made for the Cloud. Based on the Open-Source toolkit http://vertx.io[Vert.x] it is reactive and can handle big files and a large number of parallel requests.
* GeoRocket exists in two editions--an Open-Source version and a Pro edition for enterprise applications

=== Architecture

GeoRocket has a http://www.reactivemanifesto.org/[reactive], scalable and asynchronous software architecture. Imported files are split into chunks that are indexed individually. The data store keeps unprocessed chunks. This enables you to later retrieve the original file that you put into GeoRocket without losing any information.footnote:[Exported files might have a slightly different formatting. Whitespaces between chunks might be different, but other than that, exported files contain the exact same information as imported ones.]

The following figure depicts the software architecture of GeoRocket.

[[figure-georocket-architecture]]
.The architecture of GeoRocket
image::images/architecture.svg[alt="GeoRocket architecture diagram", width="70%", align="center"]

The import process starts in the upper left corner. Every imported file is first split into individual chunks. Depending on the input format chunks have different meanings. CityGML files, for example, are split into individual `cityObjectMember` objects which are typically the buildings of a city model.

Attached to each chunk is metadata containing additional information describing the chunk. This includes tags specified by the client and other automatically generated attributes.

The chunks are put into the GeoRocket data store. There are several data store implementations supporting different back-ends such as Amazon S3, HDFS, MongoDB or the local hard drive (default).

Immediately after a chunk has been put into the data store the indexer starts working asynchronously in the background. It reads new chunks from the data store and analyses them for known patterns. It recognises spatial coordinates, attributes and other content. The indexer creates a directory of every item found--the '`index`'.

The export process starts with querying the indexer for chunks matching the <> supplied by the client. These chunks are then retrieved from the data store (together with their metadata) and merged into a result file.

==== Secondary data store

GeoRocket's architecture allows for the creation of secondary data stores that co-exist with the main data store where the original chunks are kept. The following figure depicts the process:

.Secondary data store
image::images/secondary-data-store.svg[alt="Secondary data store", width="52%", align="center"]

Whenever a new chunk is added to the data store a custom processor can retrieve it to create a secondary data store. Data from this store can then be served directly to the client without further processing. Possible use cases for this scenario are:

* Optimize 3D scenes for web-based visualisation. Create a secondary data store that contains https://www.khronos.org/gltf[glTF] files. glTF is a specification for the efficient transmission of 3D scenes to the browser.
* Convert all chunks stored in CityGML version 2 to CityGML version 1 for clients that are incompatible to version 2.
* Process a 3D city model and derive LOD1 buildings from LOD2 or LOD3.

The advantage of keeping a secondary data store is that it is created automatically in the background when new data is added to GeoRocket. This avoids manual processing. Individual processors may even keep the secondary data store up to date incrementally and only re-create those parts that have changed since it has been created or updated the last time.

== Getting started

GeoRocket consists of two components: the server and the command-line interface (CLI). Download the _Server_ and _CLI_ bundles from the GeoRocket website and extract them to a directory of your choice.

NOTE: GeoRocket requires http://www.oracle.com/technetwork/java/index.html[Java 8] or higher to be installed on your system.

Open your command prompt and change to the directory where you installed GeoRocket Server. Execute `georocketd` to run the server.

  cd georocket-server-1.0.0/bin
  ./georocketd

Please wait a couple of seconds until you see the following message:

  GeoRocket launched successfully.

The server has launched and now waits for incoming HTTP requests on port `63020` (default).

Next open another command prompt and change to the directory where you installed GeoRocket CLI. Run `georocket` to access the server through a convenient command-line application.

  cd georocket-cli-1.0.0/bin
  ./georocket

You can now import your first geospatial file. Suppose your file is called `/home/user/my_file.gml`. Issue the following command to import it to GeoRocket.

  ./georocket import /home/user/my_file.gml

GeoRocket CLI will now send the file to the server. Depending on the size of the dataset this will take a couple of seconds up to a few minutes (for very large datasets).

Finally, export the contents of the whole store to a file using the `export` command.

  ./georocket export / > my_new_file.gml

TIP: You can also search for individual features (chunks) and export only a part of the previously imported file. Refer to the <> section.

That's it! You have successfully imported your first file into GeoRocket.

== Command-line application

GeoRocket comes with a handy command-line interface (CLI) letting you interact with the server in a convenient way on your command prompt. The interface provides a number of commands. The following sections describe each command and their parameters in detail.

[NOTE]
====
In the following sections it is assumed that you have the `georocket` executable in your path. If you have not done so already, you may add it to your path with the following command (Linux):

  export PATH=/path/to/georocket-cli-1.0.0/bin:$PATH

Or under Windows do:

  set PATH=C:\path\to\georocket-cli-1.0.0\bin;%PATH%
====

=== Help command

Display help for the command-line interface and exit.

Examples:

  georocket

or

  georocket --help

or

  georocket help

The help command also gives information on specific CLI commands. Just provide the name of the command you would like to have help for. For example, the following command displays help for the <>:

  georocket help import

[[import-command]]
=== Import command

Import one or more files into GeoRocket. Specify the name of the file to import as follows.

  georocket import myfile.xml

You can also import the file to a certain layer. The layer will automatically be created for you. The following command imports the file `myfile.xml` to the layer `CityModel`.

  georocket import --layer CityModel myfile.xml

Use slashes to import to sub-layers.

  georocket import --layer CityModel/LOD1/Center myfile.xml

You may attach tags to imported files. Tags are human-readable labels that you can use to search for files or chunks stored in GeoRocket. Use a comma to separate multiple tags.

  georocket import --tags city,district,lod1 myfile.xml

=== Export command

Export a layer stored in GeoRocket. Provide the name of the layer you want to export.

  georocket export CityModel/LOD1

By default the export command writes to standard out (your console). Redirect output to a file as follows.

  georocket export CityModel/LOD1 > lod1.xml

You may also export the whole data store. Just provide the root layer `/` to the export command.

  georocket export /

WARNING: Exporting the whole data store may take a while depending on how much data you have stored in GeoRocket.

[[search-command]]
=== Search command

Search the GeoRocket data store and export individual geospatial features (chunks). Provide a <> to the search command as follows.

  georocket search myquery

You can also search individual layers.

  georocket search --layer CityModel myquery

By default the search command writes to standard out (your console). Redirect output to a file as follows.

  georocket search myquery > results.xml

Use a space character to separate multiple query terms. Search results will be combined by logical OR.

See the <> section for a full description of all possible terms in a query.

=== Delete command

Remove geospatial features (chunks) or whole layers from the GeoRocket data store. Provide a <> to the delete command to select the features to delete.

  georocket delete myquery

You can also restrict the delete command to a certain layer.

  georocket delete --layer CityModel myquery

Delete a whole layer (including all its chunks and sub-layers) as follows.

  georocket delete --layer CityModel/LOD1

You may even delete the whole data store by specifying the root layer `/`.

  georocket delete --layer /

CAUTION: This is a dangerous operation. It will remove everything that is stored in your GeoRocket instance. There is no safety net--no confirmation prompt and no recycle bin.

== HTTP interface

GeoRocket Server provides an (REST-like) HTTP interface that you can use to interact with the data store as well as to embed GeoRocket in your application. By default GeoRocket listens to incoming connections on port 63020.

=== GET information

Get information about GeoRocket (application name, version, etc.).

===== Resource URL

  /

===== Parameters

None

===== Status codes

[cols="1,2"]
|===
| *200*
| The operation was successful
|===

===== Example request

----
GET / HTTP/1.1
----

==== Example response

----
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 100

{
  "name" : "GeoRocket",
  "version" : "1.0.0",
  "tagline" : "It's not rocket science!"
}
----

=== GET file

Search the data store for chunks that match a given <>. Merge the chunks found and return the result as a file.

===== Resource URL

  /store/:path

===== Parameters

[cols="1,2"]
|===
| *path* +
  _(optional)_
| The absolute path to a layer to search. Omit this parameter to query the whole data store.
| *search* +
  _(optional)_
| A URL-encoded <>. If no query string is provided all chunks from the requested layer will be returned.
|===

===== Status codes

[cols="1,2"]
|===
| *200*
| The operation was successful
| *400*
| The provided information was invalid (e.g. malformed query)
| *404*
| The requested chunks were not found or the query returned an empty result
| *500*
| An unexpected error occurred on the server side
|===

===== Example request

  GET /store/CityModel?search=LOD1+textured+13.378,52.515,13.380,52.517 HTTP/1.1

===== Example response

----
HTTP/1.1 200 OK
Transfer-Encoding: chunked



  ...

----

=== POST file

Import a file into GeoRocket. Split the file into chunks and put them into the data store.

===== Resource URL

  /store/:path

===== Parameters

[cols="1,2"]
|===
| *path* +
  _(optional)_
| The absolute path to a layer where the chunks from the imported file should be stored. Omit this parameter to put the chunks into the data store's root layer `/`.
| *tags* +
  _(optional)_
| A comma-separated list of tags (i.e. labels) to attach to each imported chunk.
|===

===== Status codes

[cols="1,2"]
|===
| *202*
| The operation was successful. The file was accepted for importing and is now being processed asynchronously.
| *400*
| The provided information was invalid (e.g. malformed input file)
| *500*
| An unexpected error occurred on the server side
|===

===== Example request

----
POST /store/CityModel?tags=LOD1,textured HTTP/1.1
Content-Length: 35903517



  ...

----

==== Example response

----
HTTP/1.1 202 Accepted file - importing in progress
Content-Length: 0
----

=== DELETE chunks

Delete chunks or layers from the data store.

===== Resource URL

  /store/:path

===== Parameters

[cols="1,2"]
|===
| *path* +
  _(optional)_
| The absolute path to the layer from which chunks matching the given query should be deleted. If no query is given this is the path to the layer to delete (including all its contents--sub-layers and chunks).
| *search* +
  _(optional)_
| A URL-encoded <> specifying which chunks should be deleted. If no query string is provided the whole layer is deleted.
|===

CAUTION: If you don't specify a layer (`path`) nor a query (`search`) then the whole contents of the GeoRocket data store will be deleted.

===== Status codes

[cols="1,2"]
|===
| *204*
| The operation was successful. The matching chunks were deleted from the data store.
| *400*
| The provided information was invalid (e.g. malformed query)
| *500*
| An unexpected error occurred on the server side
|===

NOTE: This HTTP method is idempotent. Even if the given query returns no results (i.e. if there is nothing to delete) the operation completes successfully with a status code of `204`.

===== Example request

----
DELETE /store/CityModel?search=LOD1 HTTP/1.1
----

==== Example response

----
HTTP/1.1 204 No Content
Content-Length: 0
----

[[query-language]]
== Query language

NOTE: As of version 1.0.0 the query language is rather limited. At the moment you can only specify strings and bounding boxes.

=== Strings

GeoRocket performs a full-text search for strings in every tag and every indexed attribute.

Example:

  string

=== Bounding boxes

Bounding boxes can be specified using four floating point numbers separated by a comma. The format is:

  left,bottom,right,top

or

  minimum_longitude,minimum_latitude,maximum_longitude,maximum_latitude

NOTE: As of version 1.0.0 GeoRocket only supports spatial queries given in WGS84 coordinates (longitude/latitude). However, data stored in GeoRocket can have an arbitrary spatial reference system as long as it is specified in the original file.

Example:

  13.378,52.515,13.380,52.517

=== Logical operators

The operators <>, <> and <> can be used to logically combine terms in a query. They are applied using the following notation:

  (  ... )

Operands are separated by a space character. Logical operations can be nested.

Examples:

  AND(a b)
  AND(a NOT(b))
  OR(NOT(a) NOT(b))

==== OR

Use the logical OR operator to search for chunks that match at least one of the given operands.

Example:

  OR(foo 13.378,52.515,13.380,52.517 bar)

This example matches all chunks that have a tag or indexed attribute with the value `foo` or `bar` as well as those that are within the bounding box `13.378,52.515,13.380,52.517`.

By default, if you don't specify a logical operation, all top-level terms in a query are combined by OR. Just use a space character to separate operands. The following query is a shorthand for the example above.

Example:

  foo 13.378,52.515,13.380,52.517 bar

==== AND

Use the logical AND operator to search for chunks that match all of the given operands.

Example:

  AND(13.378,52.515,13.380,52.517 foobar)

This example matches all chunks that are within the bounding box `13.378,52.515,13.380,52.517` and that have a tag or indexed attribute with a value of `foobar`.

==== NOT

Use the logical NOT operator to search for chunks that match none of the given operands.

Example:

  NOT(13.378,52.515,13.380,52.517 foobar)

This example matches all chunks that are not within the bounding box `13.378,52.515,13.380,52.517` and that don't have a tag or indexed attribute with a value of `foobar`.

== Client configuration

You can configure GeoRocket's command-line application (CLI) by editing the file `conf/georocket.yaml` in the application directory. The file must be a valid YAML file. The following sections describe possible configuration keys and values.

Keys are specified using the dot notation. You can use the keys in your file as they are specified here or use normal YAML notation instead. For example, the following configuration item

  georocket.host: localhost

is identical to:

  georocket:
    host: localhost

=== Server connection

[cols="1,2"]
|===
| *georocket.host* +
  _(default: "localhost")_
| The host where GeoRocket Server is running.
| *georocket.port* +
  _(default: 63020)_
| The TCP port GeoRocket Server is listening on.
|===

== Server configuration

You can configure GeoRocket Server by editing the file `conf/georocketd.yaml` in the application directory. The file must be a valid YAML file. The following sections describe possible configuration keys and values.

Keys are specified using the dot notation. You can use the keys in your file as they are specified here or use normal YAML notation instead. For example, the following configuration item

  georocket.storage.class: io.georocket.storage.file.FileStore

is identical to:

  georocket:
    storage:
      class: io.georocket.storage.file.FileStore

=== HTTP interface

[cols="1,2"]
|===
| *georocket.host* +
  _(default: "127.0.0.1")_
| The host GeoRocket should bind to. By default GeoRocket only listens to incoming connections from `127.0.0.1` (`localhost`). If you want it to listen to connections coming from arbitrary clients set this configuration item to `0.0.0.0`.
| *georocket.port* +
  _(default: 63020)_
| The TCP port GeoRocket should listen on.
|===

=== Back-ends

[cols="1,2"]
|===
| *georocket.storage.class* +
  _(defaults to the <>)_
| The data store implementation to use. Possible values include: +
  `io.georocket.storage.file.FileStore` +
  `io.georocket.storage.hdfs.HDFSStore` +
  `io.georocket.storage.mongodb.MongoDBStore` +
  `io.georocket.storage.s3.S3Store`
|===

[[config-backend-file]]
==== File back-end

===== Data store implementation

  io.georocket.storage.file.FileStore

===== Configuration

[cols="2,2"]
|===
| *georocket.storage.file.path* +
  _(required)_
| The path on the local hard drive where the data store should be located.
|===

==== HDFS

===== Data store implementation

  io.georocket.storage.hdfs.HDFSStore

===== Configuration

[cols="2,2"]
|===
| *georocket.storage.hdfs.defaultFS* +
  _(required)_
| The endpoint of the HDFS NameNode
| *georocket.storage.hdfs.path* +
  _(required)_
| The path on the distributed file system where the chunks should be stored. The directory must exist and write permissions must have been granted to the user executing GeoRocket.
|===

==== MongoDB

===== Data store implementation

  io.georocket.storage.mongodb.MongoDBStore

===== Configuration

[cols="2,2"]
|===
| *georocket.storage.mongodb.connectionString* +
  _(required)_
| The connection string URI used to connect to MongoDB. For example:
`mongodb://localhost:27017`
| *georocket.storage.mongodb.database* +
  _(required)_
| The database where the chunks should be stored
|===

==== Amazon S3

===== Data store implementation

  io.georocket.storage.s3.S3Store

===== Configuration

[cols="2,2"]
|===
| *georocket.storage.s3.accessKey* +
  _(required)_
| The Amazon S3 Access Key used for authentication
| *georocket.storage.s3.secretKey* +
  _(required)_
| The Amazon S3 Secret Key used for authentication
| *georocket.storage.s3.host* +
  _(required)_
| The host of the S3 endpoint
| *georocket.storage.s3.port* +
  _(default: 80)_
| The port of the S3 endpoint
| *georocket.storage.s3.bucket* +
  _(required)_
| The S3 bucket where chunks should be stored
| *georocket.storage.s3.pathStyleAccess* +
  _(default: true)_
| `true` if path-style access to the S3 bucket is used or `false` if a sub-domain is used
| *georocket.storage.s3.forceSignatureV2* +
  _(default: false)_
| `true` if S3 requests should be signed using the old Signature V2 algorithm instead of newer versions
| *georocket.storage.s3.requestExpirySeconds* +
  _(default: 600)_
| The number of seconds a pre-signed S3 request should stay valid
|===

[[indexer-elasticsearch]]
=== Elasticsearch

The GeoRocket distribution contains a version of Elasticsearch that will
automatically be started together with GeoRocket. You can disable this
behaviour and use a remote Elasticsearch instance instead.

Set the following configuration items to disable the provided Elasticsearch
instance and to configure the host and port of the remote one:

  georocket:
    index:
      elasticsearch:
        embedded: false
        host: 127.0.0.1
        port: 9200

==== Configuration

[cols="2,2"]
|===
| *georocket.index.elasticsearch.embedded* +
  _(default: true)_
| `true` if GeoRocket should launch the provided Elasticsearch instance. `false`
if it should connect to an existing instance.
| *georocket.index.elasticsearch.host* +
  _(default: "localhost")_
| Elasticsearch host address
| *georocket.index.elasticsearch.port* +
  _(default: 9200)_
| Elasticsearch TCP port
|===