org.apache.geode.cache.query.internal.index.package.html Maven / Gradle / Ivy
Show all versions of geode-core Show documentation
Design: Indexes in GemFire Querying
This
document describes the creation, use, and maintenance of indexes in
the GemFire query processor, as designed for the 4.0 release.
The index types that will be supported in that release include
“functional sorted” indexes and “primary key”
indexes.
Types of Indexes
Functional Sorted
Indexes
A functional index is so named because it can be
used to index on data using any function of the region entries that
make up the data. It is a sorted index, so it supports comparisons
using any of the relational operators (<, >, <=, >=, =,
<>).
Primary Key
Indexes
A
Primary Key index is cover for the keys that are already in a
Region. Creating a primary key index allows the query processor to
use the keys in a region to improve performance in query
evaluation. A primary key index provides the query service with
information about the relationship between the keys in the region
and the values in the region. Since the keys are not sorted in a
region, a primary key index is only used for queries using the =
operator. As an example of a query that can use a primary key
index, say the /Portfolios region has Portfolio objects keyed by
its ID. The primary key index is created with the parameters
fromClause=”/Portfolios” and indexedExpression=”ID”.
The projectionAttributes must default to “*”. This
primary key index could then be used for a query such as:
SELECT
DISTINCT * FROM /Portfolios WHERE ID = '3434'
In
GemFire, the primary key index does not require any extra structure
or maintenance since it it implicit in the Region implementation
itself. The rest of this document, therefore, describes the
functional sorted indexes only.
Structure of Indexes
Indexes are stored in
instances of IndexManager. Each region in the cache
has an IndexManager. When a region is destroyed, so is its
IndexManager and the indexes stored therein.
An index contains two
maps, a forward map and a reverse map. The forward
map is used when evaluating a query, and the reverse map is used
when updating the index when a region is modified.
Forward Map
The forward map has
the structure:
SortedMap<key: Object value:
Map<key: RegionEntry value: Object>>
The
keys of the SortedMap are the values of the indexedExpression
as is evaluated for each element
in the base collection. The value
is a Map that maintains an assocation between the context
object from the Region and the targets derived from
that context.
The context object
is a RegionEntry. This object is used to determine which
entry in the region the target objects are derived from.
The target
objects are values that will be used in the results of the SELECT
expression. This target object is an element from the base
collection if the projectionAttributes is *, or is an
element from the base collection transformed by the function
defined by the projectionAttributes. The target is
conceptually a Set of values, but to conserve on resources in the
index, it is represented as follows:
When it is a single
object, it is stored as such in the index.
If
it is a set of objects, it is stored in a collection which
itself is an instance of SelectResults,
either a ResultsBag or a StructBag.
Reverse Map
The
reverse map has the structure:
Map<key:
RegionEntry, value: Object>,
where the value is the key(s) in the forward map. If there is
more than one key in the forward map that reference a
RegionEntry, then the value of this map is a collection of all
those keys. The reverse map is used to quickly discover which
keys in the forward map have a reference to a particular
RegionEntry. It is used when a region entry is modified and the
indexes of the region need to be updated.
Index
Creation
Indexes are specified with three parameters: the
fromClause, the indexedExpression, and the
projectionAttributes. These parameters are analogous and
equivalent to the corresponding parts of the SELECT expression that
the indexes will be used to evaluate.
FromClause
The
fromClause defines
the base collection of objects that are being selected from, and
optionally defines iterator variables that can be referred to in
the indexedExpression or
the projectionAttributes. The fromClause
can be a single expression that
specifies a collection, or it can be a list of expressions that
drill down into or join across a complex object
structure and define a namespace of objects that can be referenced
in the query. Each expression in the fromClause sets up a nested
iteration.
-
If there is one
expression in the fromClause, then the base
collection is the value of that expression.
If there are
multiple, comma-delimited, expressions in the fromClause,
then the base collection is a struct that consists of a field for
each of the expressions in order. These structs represent the
cartesian product of each collection specified in the fromClause.
Examples of
fromClauses that
could be used to create an index include:
/root/employees
/root/employees e
/portfolios ptfo, ptfo.positions pos
(in the last example the base collection will be of type struct<Portfolio, Position>)
IndexedExpression
The
indexedExpression
specifies the value that is indexed
on. For a sorted index, it specifies the expression that will be
used to compare with another value that is independent of objects
in the from clause (i.e. a constant) using the relational operators
(<, =, >, <=, >=, <>). Example indexedExpressions
that could be used to create an
index include:
empId
ptfo.active
pos.sharesOutstanding
ptfo.someMethodReturningAComparable(element(select distinct * from
positions.values where sharesOutstanding > 10000))
ProjectionAttributes
[Open
issue: Is it really practical to put projectionAttributes on an
index? Does Sybase/Oracle support this at all? Considering the
extra complexity it introduces in the implementation (as seen
later) this may be a lower priority than other features].
The
projectionAttributes is
an expression that does a transformation on the results of a query,
and this projection can be pre-computed for each element in the
results and stored in the index as well. If there is a
comma-delimited list of expressions in the projectionAttributes,
then a struct is created with the value of each expression as a
field in the struct. An identifier can also be included to
explicitly name the fields in the struct. If
no identifiers are provided, then the field names are derived from
the tail attributes names or generated by the query processor. If
the projectionAttributes is
* then
there is no transformation on the results. Examples
projectAttributes that
could be used to create an index include:
*
empId
e.key, e.value, e.value.id
key: e.key, value: e.value, id: e.value.id
e.key AS k, e.value AS v, e.value.id AS id
Algorithm for
Creating Indexes
For the purposes of this section, assume we
have a region of Portfolios with Positions as defined in the use
cases section of the functional specification with region path
“/portfolios”. An index is created to speed up queries
that are similar to:
SELECT DISTINCT posn
FROM
/portfolios ptfo, ptfo.positions.values posn
WHERE ptfo.status =
'active' AND posn.sharesOutstanding > 10000
The
createIndex method is called with the following
parameters:
fromClause = “/portfolios ptflo,
ptflo.positions.values posn”;
indexedExpression =
“posn.sharesOutstanding”;
projectionAttributes =
“posn”;
Extract region
information from the fromClause.
In order to create an
index, the fromClause must reference one and only one region using
a regionPath, and the fromClause, indexedExpression, and
projectionAttributes must not have any query parameters in them.
If these restrictions do not hold, then the createIndex method
will throw an exception.
Determine the one and only one
region path in the fromClause.
For the above example, the
region information from the fromClause determines that the region
path is “/portfolios”.
Transform the
parameters.
Make the following transformation on the
fromClause::
Where
it references the regionPath, substitute $1.
For the working
example, we now have the fromClause
$1 ptflo,
ptflo.positions.values posn
Construct and
Excecute the Index Maintenance Query (IMQ)
A
special Query is created for an index which we will call the Index
Maintenance Query (IMQ). This query is constructed as follows.
Construct
IMQ
Using the transformed fromClause as described above
and the other index parameters, construct the query as:
SELECT
DISTINCT idxExpr: indexedExpression, target:
projectionAttributes
FROM
fromClause
Save
this query with the index and compile into bytecodes if
possible,as it will be re-used for index maintenance as well as
for index creation.
For our working example, the IMQ would
be:
SELECT DISTINCT idxExpr:
posn.sharesOutstanding, target:
posn
FROM $1
ptflo, ptflo.positions.values posn
Execute
IMQ.
Iterate through
each RegionEntry in the region, and for each RegionEntry:
Create
an special instance of QRegion that contains exactly one entry,
the current RegionEntry. Execute the IMQ using this QRegion as
the $1 query parameter. The result of this query
provides structs that contains the the index values and target
values needed for the index for this region entry.
[Note:
We need to implement a light-weight read-only Region
implementation that has just one RegionEntry in it, and use this
Region to construct this special QRegion instance.]
Use the results to
build the index.
These structs can now be
used directly to build the index by iterating over them and
collecting the indexed values and constructing the map of
RegionEntry=>target objects, and adding the RegionEntry to the
reverse map.
Concurrency during
Index Creation
For
indexes that are specified in the cache.xml, the indexes are
created during initialization before a reference to the region is
released to application threads.
Whenever an index is being
created, modifications to the region by other threads must be
blocked, i.e a local region write lock is obtained. Threads that
only read from the region are not blocked. [TBD – to we
already have a local region write lock, or does the query group
need to implement this?]
Index Use while
Executing a Query
To
determine if an index is compatible for a particular query,
the fromClause, indexedExpression, and projectionAttributes of an
index must be compatible with a query as described below. In the
future, histograms should be added to indexes along with query
transformations cost-based estimation to the query processor so that
more intelligent heuristics can be used to determine whether the use
of a particular index is actually worthwhile. In 4.0, if an
index is deemed to be compatible then it will be used. The
algorithms described here make some simplifying assumptions and an
index may not necessarily be used in all cases where they could be
if the algorithms were more sophisticated. This could be improved on
in future releases.
Canonicalization
To
facilitate the matching algorithms as described, later, the index
parameters (fromClause, indexedExpression, and
projectionAttributes) and the query being executed are first put
into a canonicalized form so there are no variables in the query.
Canonicalization is done as follows:
Queries and index
parameters are first compiled into a tree of nodes which are
instances of CompiledValue. The term “compiled”
here should not be confused with byte-code compilation which is a
lower level of compilation that will most likely not be
implemented in this release due resource restrictions. See the
package.html for org.apache.geode.cache.query for further
details on byte-code compilation.
Each iterator
definition in the fromClause is assigned a placeholder that
represents its runtime iteration. The product currently has an
internal class that implements these placeholders, named
RuntimeIterator. For the purpose of this document, we
will refer to these placeholders as itr1, itr2, ..., itrN.
All identifiers in the query and index parameters that refer to
explicitly declared iterator variables are replaced with reference
to these placeholders. Any implicit references to attributes or
methods are resolved to determine which iterator it operates on,
and these implicit references are made explicit and given a
reference to the appropriate placeholder.
Remove any
unreferenced iterator definitions in the fromClause, i.e. any
definitions that are not referenced anywhere else in the query or
index parameters. Note, however, that a * projectionAttributes
implicitly references all iterator definitions in the fromClause.
Given
the example query
SELECT
DISTINCT posn
FROM /portfolios, positions.values posn
WHERE
status = 'active' AND sharesOutstanding > $1
the
canonicalized form would be a tree of CompiledValue nodes which
could be transcribed as:
SELECT DISTINCT itr2
FROM
/portfolios itr1, itr1.positions.values
itr2
WHERE
itr1.status =
'active' AND itr2.sharesOutstanding
> $1
Compatible
fromClause
The
fromClause of an index is compatible with a query if the index
fromClause is a sublist of the query fromClause (a sublist includes
the case of being equivalent lists).
Compatible
projectionAttributes
The
projectionAttributes of an index is automaticaly compatible if it
is * (no projection), or if it is equivalent to the
projectionAttributes in the query.
Compatible
indexedExpression
Equivalence:
The indexedExpression (tree) is passed into the the whereClause
(tree) of the query using a method that calculates compatibility
by potentially recursively visiting children nodes that represent
subexpressions. By default, a node in the whereClause will answer
true only if the indexedExpression is equivalent to the node
itself. E.g., the expression element(sub_expr1)in
the indexedExpression
is compatible with element(sub_expr2)in
the whereClause if and only if sub_expr1 and sub_expr2
are equivalent.
Some
types of expression nodes, however, define compatible in other
ways.
An
AND node in the
whereClause is compatible with an indexedExpression not only if
it is equivalent based on its unordered terms, but also if
any of its terms are compatible.
An
OR node in this
release is compatible only if it is equivalent based on its
unordered terms.
A
relational expression (using one of the operators <, >, <=,
>=, or <>) is compatible if equivalent or if both of the
following are true:
one
of its terms is compatible
the
other term is a constant, i.e. is not dependent on any
of the iterators.
Compatible
Index
If
all of these compatibility tests for a query pass for a particular
index, then the query will use that index. Note that it is possible
for a query to use multiple compatible indexes as explained below.
Query
Evaluation
There
are two ways a query can be evaluated, by iteration or
“filtering”
(for the lack of a better term).
Iteration
A
query is evaluated by brute-force, most likely by iteration and
cartesian product of the collections in the fromClause if there
are no compatible indexes and the where clause is dependent on at
least one of the iterators. The whereClause tree is visited for
each element in the iteration across the cartesian product, and
those elemets for which the whereClause evaluates to true are kept
in the result set and the projection attributes are applied to it,
and for those elements for which the whereClause evaluates to true
are discarded.
Filtered
Evaluation
When
there is at least one compatible index, then the query is
evaluated by “filtering” which means it does
intersections or iterations on intermediate result sets obtained
from indexes instead of the entire base collection. For filtered
evaluation, the compiled whereClause tree is visited recursively,
but instead of doing this for each iteration of the base
collection, the entire result set is build up as it visits the
nodes in the tree. An expression that is compatible with an index
will produce a result set using that index. When combined with
other terms in an AND expression, either other result sets will be
intersected with each other to produce a result set for the entire
AND expression, or some terms will produce result sets that are
intersected and other terms that cannot use an index will be
evaluated against the intermediate results from other terms by
iteration, causing elements in the intermediate results to be
dropped if they don't evaluate to TRUE for the other term(s).
Terms in AND expressions that use indexes should always be
evaluated first before terms that require iteration; this
minimizes the size of the iteration required.
Projections
on Indexes
When
computing the result set from an index lookup that contains a
projection, the RegionEntry where the target objects are derived
from should be kept as well. This result set may need to be
intersected with the results of another index lookup that may or
may be projected, or may need to be iterated across with an
expression that refers to data that is not in the projection.
Each of these cases is described as follows.
If
two index lookup result sets are intersected and they both have
the same projection or they both have no projection, then do the
intersection normally, keeping the context information (i.e. the
RegionEntries) in the result.
It
two index lookup result sets need to be intersected and one has
a projection and the other does not, then apply the projection
to the result set that does not have a projection and then do
the intersection.
If
an index lookup result set has no projection and an expression
is being applied through iteration, then nothing special needs
to be done.
If
an index lookup result set does have a projection and an
expression is being applied through iteration, then first
determine if the expression refers to any iterator variable that
is not available in the projection. If so, then the full
(unprojected) base collection element(s) (i.e. the part of the
cartesian product this entry contributes to) should be computed
from the index results for each element using the RegionEntry
instead of the projection retrieved from the index before the
expression is applied to determine whether the projection(s)
from the index should added to the intermediate results.
Choice
of multiple Indexes
Where
there is an expression that is compatible with more than one
available index, then if one index has a projection and the other
does not, then prefer the index with the projection. In some
cases an index for a complex expression is available as well as
an index for a simpler expression that is part of the more
complex expression. In this case the index for the more complex
expression is preferred. [to do – provide example]
Independent
whereClause. In the corner case where the whereClause is not
dependent at all on any of the iterators then iteration is also
not necessary as the result will simply be the entire projected
base collection or the empty set.
Index Maintenance
Synchronous
Synchronous
index maintenance implies that the thread that makes a modification
to region data does not return until the indexes for that region
are updated. This guarantees that a thread that makes modifications
to a region and then does a query will get results that reflect the
changes.
Asynchronous
Asynchronous
index maintenance uses a background thread that does the index
maintenance. Operations to update the index are queued in Runnable
added to a QueuedExecutor and a background thread takes operations
off the queue and updates the indexes. Although the operations
should be done in order with respect to a particular region, we
should try to avoid using a thread per region. One thread per
cache, however, may not be sufficient so some compromise may need
to be made with respect to the number of threads. The size of the
index maintenance queue may need to be made configurable by the
user to prevent index maintenance from lagging behind region
updates too much. Other than the use of background threads, index
maintenance is the same for both synchronous and asynchronous.
Upon region
modification. When a region that has indexes is modified, an
updateIndexes call is made to the region's index manager to update
the indexes. A reference to the RegionEntry that is being modified
is provided, and information regarding whether the modification was
a create, destroy, or update. If it was an update, then including
the “old value” is not necessary since the old data in
the indexes is completely identified by the RegionEntry.
Remove old data.
If
the operation is not a create, the old data that is associated
with the RegionEntry is removed from the indexes, using the
reverse map of the index.
Add
new data.
If the operation is not a destroy, the new data
is calculated and added to the index in both the forward and
reverse maps. If it is a destroy, then skip the next section on
computing the new data.
New Data: Compute
the new index values and target values.
Execute the IMQ
using the same procedure that
was used for index creation, but only use the one created or
updated RegionEntry to construct a QRegion. This provides the
index values and target values to use to update the index for this
entry.
Concurrency
Each
index has a ReadWriteLock.
This lock can either be an instance of
the backport of the JDK 1.5 ReentrantReadWriteLock can be used. The
read lock allows multiple readers and the write lock is exclusive.
During index maintenance, all the indexes are write-locked up
front. When a query needs to use an index, it obtains a read lock
on the index while it uses it.
Futures
Improved Concurrency
During Region Maintenance
Multiple Reader,
Multiple Writer ReadWriteLock
<TBD>