
com.bigdata.rdf.graph.package.html Maven / Gradle / Ivy
Show all versions of bigdata-gas Show documentation
GAS Engine API
The GAS (Gather Apply Scatter) API was developed for PowerGraph
(aka GraphLab 2.1). This is a port of that API to the Java platform
and schema-flexible attributed graphs using RDF.
Graph algorithms are stated using the GAS (Gather, Apply,
Scatter) API. This API provides a vertex-centric approach to graph
processing ("think like a vertex") that can be used to write a large
number of graph algorithms (page rank, triangle counting, connected
components, SSSP, betweenness centrality, etc.). The GAS API allows
the GATHER operation to be efficently decomposed using fine-grained
parallelism over a cluster.
Part of our effort under the XDATA program is to examine how
fine-grained parallelism can be leveraged on GPUs and other many-core
devices to deliver extreme performance on graph algorithms. We are
looking at how the GAS abstraction can be evolved to expose more
parallelism.
The interfaces of this API are stated in terms of RDF {@link
org.openrdf.model.Value} objects (for vertices) and {@link
org.openrdf.model.Statement} objects (for edges). Link attributes are
handled efficiently by the bigdata implementation, which co-locates
them in the indices with the links and then applies prefix compression
to deliver a compact on disk foot print. See the section on
Reification Done Right (below) for more details.
Reification Done Right and Property Graphs
Reification
Done Right (RDR) explains the relationship between the somewhat
opaque concept of RDF reification (which we use only for interchange)
and statements about statements (more generally, the ability to turn
any edge into a vertex and make statements about that vertex). There
are different ways to handle statemetns about statements efficiently
in the database, however these are internal physical schema design
questions. From a user perspective, the main concern should be the
performance of the database platform when using this feature. Bigdata
uses a combination of inlining and prefix compression to provide a
dense fast, bi-directional encoding of statements about statements and
fast access paths whether querying by vertices, property values, or
link attributes. You can also write queries using a high-level query
language (SPARQL) that are automatically optimized and executed
against the graph.
The RDR approach is more general than the
Property Graph Model - anything that you can do with a
property graph you can at as efficiently in an intelligently designed
RDF database. Further, RDF graphs allow efficient handling of the
following cases that are disallowed under the property graph model:
- A vertex may have multiple property values for the same key.
- A link may have multiple link attributes for the same key.
- A link may serve as a vertex - thus you may have links whose
sources or targets are other links (hypergraphs).
Because of its lack of cardinality constraints on property values
and generality, RDF data sets may be freely combined and then
leveraged. Data-level collisions simply do not occur.