com.bigdata.rdf.rules.RuleContextEnum Maven / Gradle / Ivy
/**
Copyright (C) SYSTAP, LLC DBA Blazegraph 2006-2016. All rights reserved.
Contact:
SYSTAP, LLC DBA Blazegraph
2501 Calvert ST NW #106
Washington, DC 20008
[email protected]
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
package com.bigdata.rdf.rules;
import com.bigdata.bop.IPredicate;
import com.bigdata.btree.BTree;
import com.bigdata.journal.Journal;
import com.bigdata.journal.TemporaryStore;
import com.bigdata.rdf.inf.TruthMaintenance;
import com.bigdata.rdf.store.AbstractTripleStore;
import com.bigdata.rdf.store.LocalTripleStore;
import com.bigdata.rdf.store.TempTripleStore;
import com.bigdata.relation.accesspath.IBuffer;
import com.bigdata.relation.rule.IProgram;
import com.bigdata.relation.rule.IRule;
import com.bigdata.service.IBigdataFederation;
/**
*
* Type-safe enumeration capturing the primary uses cases for rule execution.
*
*
* The uses cases here reduce to two basic variants: (a) Query using a
* read-consistent view; and (b) Rules that write on a database. The latter has
* two twists: for {@link #TruthMaintenance} the rules write on a
* {@link TempTripleStore} while for {@link #DatabaseAtOnceClosure} they write
* directly on the knowledge base.
*
*
* Note: The scale-out architecture imposes a concurrency control layer such
* that conflicts for access to the unisolated indices can not arise and
* therefore is not relevant to the rest of this discussion.
*
*
* For the use cases that write on a database without the concurrency control
* laer (regardless of whether it is the focusStore or the main knowledge base)
* there is a concurrency control issue that can be resolved in one or two
* different ways. The basic issue is that rule execution populates
* {@link IBuffer}s that are automatically flushed when they become full (or
* when a sequential step in an {@link IProgram} is complete). If there are
* iterator(s) reading concurrently on the same view of the index on which the
* buffer(s) write, then this violates the contract for the {@link BTree} which
* is safe for concurrent readers -or- a single writer. The parallel execution
* of more than one rule makes this a problem even with the iterators are fully
* buffered (vs the newer asynchronous iterators which have the same problem
* even when only one rule is running.)
*
*
* Note: For {@link #TruthMaintenance} we actually read from two different
* sources: a focusStore and the knowledge base. In this situation we are free
* to read on the knowledge base using an unisolated view because truth
* maintenance requires exclusive write access and therefore no other process
* will be writing on the knowledge base.
*
*
* We can do two things to avoid violating the {@link BTree} concurrency
* contract:
*
* - Read using a read-committed view (for the source on which the rules will
* write) and write on the unisolated view. The main drawback with this approach
* is that we must checkpoint (for a {@link TemporaryStore}) or commit (for a
* {@link Journal}) after each sequential step of an {@link IProgram} (including
* after each round of closure as a special case). This slows down inference
* and, for {@link TruthMaintenance}, can cause the {@link TemporaryStore} to be
* flushed to disk when otherwise it might be fully buffered and never touch the
* disk.
* -
*
* Read and write on the unisolated {@link BTree} and use a mutex lock
* coordinate access to that index. The mutex lock must serialize (concurrent)
* readers and the (single) writer. The writer gains the lock when it needs to
* flush a buffer, at which point any reader(s) on the unisolated {@link BTree}s
* block and grant access to the writer and then resume their operations when
* the writer releases the lock.
*
*
* For a single rule, only an asynchronous iterator can conflict write the task
* flushing the buffer. However, when more than one rule is being executed
* concurrently, it is possible for conflicts to arise even with fully buffered
* iterators.
*
*
* The advantage of this approach is that we can use only the unisolated indices
* (better buffer management) and we do not need to either checkpoint (for a
* {@link TempTripleStore}) or commit (for a {@link LocalTripleStore}). For
* {@link TempTripleStore} this can mean that we never even touch the disk while
* for a {@link LocalTripleStore} is means that we only commit when the closure
* operation is complete.
*
*
*
*
*
* @todo we have to jump through hoops whenever we are doing
* {@link #TruthMaintenance} with a focusStore backed by a
* {@link TemporaryStore} (which is the only way we can do it today).
*
* For database at once closure, we only need to jump through hoops when
* the database is on a {@link Journal}. If it is on an
* {@link IBigdataFederation} then the concurrency control layer ensures
* that none of the problems can arise.
*
* @todo we need to recognize the use case and then recognize which relations
* (and their indices) belong to the focusStore and the knowledge base so
* that we can choose the appropriate view for each.
*
* @todo flushing the {@link IBuffer} for mutation operations needs to
* coordinate with both the fully buffered and the asynchronous iterators.
* this is only for {@link #TruthMaintenance} or when the knowledge base
* is on a {@link Journal}. there must be one mutex per named index on
* which we will write (actually, that can be simplified to one mutex per
* relation on which we will write since the relations always update all
* of their indices).
*
* @todo Use the readTimestamp for query (so we can query for a historical
* commit time) but ignore it for {@link #DatabaseAtOnceClosure} and
* {@link #TruthMaintenance} (presuming that we are operating on the
* current state of the kb)?
*/
public enum RuleContextEnum {
/**
*
* Database at once closure is the most efficient way to compute the closure
* over the model theory for the KB. In general, database-at-once closure is
* requested when you bulk load a large amount of data into a knowledge
* base. You request database-at-once closure using
* {@link InferenceEngine#computeClosure(AbstractTripleStore)} WITHOUT the
* optional focusStore.
*
*
* As long as justifications are enabled, you can incrementally assert or
* retract statements using {@link #TruthMaintenance}. If justifications are
* NOT enabled, then you can re-compute the closure of the database after
* adding assertions. If you have retracted assertions, then you first need
* to delete all inferences from the knowledge base and then recompute the
* closure of the database.
*
*
* Database-at-once closure reads and writes on the persistent knowledge
* base and does not utilize a {@link TempTripleStore}.
*
*/
DatabaseAtOnceClosure,
/**
*
* Truth maintenance must be used when you incrementally assert or
* retract a set of explicit (or told) statements (or assertions or
* triples). Each time new assertions are made or retracted the closure of
* the knowledge base must be updated, causing entailments (or
* inferred statements) to be either asserted or retracted. This is
* handled by {@link TruthMaintenance} and {@link InferenceEngine}.
*
*
* Adding assertions is relatively straight forward since all the existing
* entailments will remain valid, but new entailments might be computable
* based on the new assertions. The only real twist is that we record
* justifications (aka proof chains) to support truth maintenance when
* statements are retracted.
*
*
* Retractions require additional effort since entailments already in the
* knowledge base MIGHT NOT be supported once some explicit statements are
* retracted. Attempting to directly retract an inference or an axiom has no
* effect since they are entailments by some combination of the model theory
* and the explicit statements. However, when an explicit statement in the
* knowledge base is retracted a search must be performed to identify
* whether or not the statement is still provable based on the remaining
* statements. In the current implementation we chase justification in order
* to decide whether or not the explicit statement will be converted to an
* inference (or an axiom) or retracted from the knowledge base. This
* process is recursive since a statement that is gets retracted (rather
* than being converted to an inference) can cause other entailments to no
* longer be supported.
*
*
* When asserting or retracting statements using truth maintenance, the
* statements are first loaded into a {@link TempTripleStore} known as the
* focusStore. Next we compute the closure of the focusStore
* against the assertions already in the knowledge base. This is done using
* {@link TMUtility} to rewrite the {@link IProgram} into a new
* (and larger) set of rules. For each original {@link IRule}, we derive N
* new rules, where N is the number of tail {@link IPredicate} in the rule.
* These derived rules reads from either the focusStore or the fused
* view of the focusStore and the knowledge base and they
* write on the focusStore. Once the closure of the
* focusStore against the knowledge base has been computed, all statements
* in that closure are either asserted against or retracted from the
* knowledge base (depending on whether the original set of statements was
* being asserted or retracted). That final step is done using either a bulk
* statement copy or a bulk statement remove operation.
*
*
* Since the state of the knowledge base does not change while we are
* computing the closure of the focusStore against the knowledge base we can
* use a read-consistent view of the knowledge base throughout the
* operation. At the same time, we are both reading from and writing on the
* focusStore.
*
*/
TruthMaintenance,
/**
*
* High-level queries (SPARQL) can in general be translated into a rule that
* is directly executed by the bigdata rule execution layer. This provides
* extremely efficient query answering. The same approach can be used with
* custom rule evaluation - there is no difference once it gets down to the
* execution of the rule(s).
*
*
* The generated rule SHOULD be executed against a read-consistent
* view of the knowledge base (NOT read-committed since that can result in
* dirty reads). In a scenario where the knowledge base is unchanging, this
* is very efficient as it allows full concurrency with less (no) overhead
* for concurrency control. In addition, concurrent writes on the knowledge
* base are allowed.
*
*
* New readers SHOULD use a read-consistent timestamp that reflects the
* desired (generally, most recent) commit point corresponding to a closure
* of the knowledge base.
*
*/
HighLevelQuery;
}