org.apache.geode.cache.query.internal.index.package.html Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of geode-core Show documentation
Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing
There is a newer version: 1.15.1
Show newest version




	
	
	
	
	
	
	



	
	Design: Indexes in GemFire Querying

This
document describes the creation, use, and maintenance of indexes in
the GemFire query processor, as designed for the 4.0 release.
The index types that will be supported in that release include
“functional sorted” indexes and “primary key”
indexes. 


	
	Types of Indexes
	
		
		Functional Sorted
		Indexes
A functional index is so named because it can be
		used to index on data using any function of the region entries that
		make up the data. It is a sorted index, so it supports comparisons
		using any of the relational operators (<, >, <=, >=, =,
		<>).
		

		Primary Key
		Indexes
A
		Primary Key index is cover for the keys that are already in a
		Region. Creating a primary key index allows the query processor to
		use the keys in a region to improve performance in query
		evaluation. A primary key index provides the query service with
		information about the relationship between the keys in the region
		and the values in the region. Since the keys are not sorted in a
		region, a primary key index is only used for queries using the =
		operator.  As an example of a query that can use a primary key
		index, say the /Portfolios region has Portfolio objects keyed by
		its ID. The primary key index is created with the parameters
		fromClause=”/Portfolios” and indexedExpression=”ID”.
		The projectionAttributes must default to “*”. This
		primary key index could then be used for a query such as:
SELECT
		DISTINCT * FROM /Portfolios WHERE ID = '3434'
In
		GemFire, the primary key index does not require any extra structure
		or maintenance since it it implicit in the Region implementation
		itself. The rest of this document, therefore, describes the
		functional sorted indexes only.
	
	

	Structure of Indexes
	
		
		Indexes are stored in
		instances of IndexManager. Each region in the cache
		has an IndexManager. When a region is destroyed, so is its
		IndexManager and the indexes stored therein.
		

		An index contains two
		maps, a forward map and a reverse map. The forward
		map is used when evaluating a query, and the reverse map is used
		when updating the index when a region is modified.
		
			
			Forward Map
			
				
				The forward map has
				the structure:
SortedMap<key: Object value:
				Map<key: RegionEntry value: Object>>
The
				keys of the SortedMap are the values of the indexedExpression
				as is evaluated for each element
				in the base collection. The value
				is a Map that maintains an assocation between the context
				object from the Region and the targets derived from
				that context.
				

				The context object
				is a RegionEntry. This object is used to determine which
				entry in the region the target objects are derived from.
				

				The target
				objects are values that will be used in the results of the SELECT
				expression. This target object is an element from the base
				collection if the projectionAttributes is *, or is an
				element from the base collection transformed by the function
				defined by the projectionAttributes. The target is
				conceptually a Set of values, but to conserve on resources in the
				index, it is represented as follows:
				
					
					When it is a single
					object, it is stored as such in the index.
					

					If
					it is a set of objects, it is stored in a collection which
					itself is an instance of SelectResults,
					either a ResultsBag or a StructBag.
				
			
			

			Reverse Map
			
				
				The
				reverse map has the structure:
Map<key:
				RegionEntry, value: Object>,
				where the value is the key(s) in the forward map. If there is
				more than one key in the forward map that reference a
				RegionEntry, then the value of this map is a collection of all
				those keys. The reverse map is used to quickly discover which
				keys in the forward map have a reference to a particular
				RegionEntry. It is used when a region entry is modified and the
				indexes of the region need to be updated.
			
		
	
	

	Index
	Creation
Indexes are specified with three parameters: the
	fromClause, the indexedExpression, and the
	projectionAttributes. These parameters are analogous and
	equivalent to the corresponding parts of the SELECT expression that
	the indexes will be used to evaluate.
	
		
		FromClause
The
		fromClause defines
		the base collection of objects that are being selected from, and
		optionally defines iterator variables that can be referred to in
		the indexedExpression or
		the projectionAttributes. The fromClause
		can be a single expression that
		specifies a collection, or it can be a list of expressions that
		drill down into or join across a complex object
		structure and define a namespace of objects that can be referenced
		in the query. Each expression in the fromClause sets up a nested
		iteration.
		
			
			Base Collection
			
				
				If there is one
				expression in the fromClause, then the base
				collection is the value of that expression.
				

				If there are
				multiple, comma-delimited, expressions in the fromClause,
				then the base collection is a struct that consists of a field for
				each of the expressions in order. These structs represent the
				cartesian product of each collection specified in the fromClause.
			
			

			Examples of
			fromClauses that
			could be used to create an index include: 
			
			/root/employees
/root/employees e
/portfolios ptfo, ptfo.positions pos
(in the last example the base collection will be of type struct<Portfolio, Position>)
		
		

		IndexedExpression
The
		indexedExpression
		specifies the value that is indexed
		on. For a sorted index, it specifies the expression that will be
		used to compare with another value that is independent of objects
		in the from clause (i.e. a constant) using the relational operators
		(<, =, >, <=, >=, <>). Example indexedExpressions
		that could be used to create an
		index include: 
		
		empId
ptfo.active
pos.sharesOutstanding
ptfo.someMethodReturningAComparable(element(select distinct * from
  positions.values where sharesOutstanding > 10000))
		

		ProjectionAttributes
[Open
		issue: Is it really practical to put projectionAttributes on an
		index? Does Sybase/Oracle support this at all? Considering the
		extra complexity it introduces in the implementation (as seen
		later) this may be a lower priority than other features].
  
The
		projectionAttributes is
		an expression that does a transformation on the results of a query,
		and this projection can be pre-computed for each element in the
		results and stored in the index as well. If there is a
		comma-delimited list of expressions in the projectionAttributes,
		then a struct is created with the value of each expression as a
		field in the struct. An identifier can also be included to
		explicitly name the fields in the struct. If
		no identifiers are provided, then the field names are derived from
		the tail attributes names or generated by the query processor. If
		the projectionAttributes is
		* then
		there is no transformation on the results. Examples
		projectAttributes that
		could be used to create an index include:
		*
empId
e.key, e.value, e.value.id
key: e.key, value: e.value, id: e.value.id
e.key AS k, e.value AS v, e.value.id AS id

		

		Algorithm for
		Creating Indexes
For the purposes of this section, assume we
		have a region of Portfolios with Positions as defined in the use
		cases section of the functional specification with region path
		“/portfolios”. An index is created to speed up queries
		that are similar to:

SELECT DISTINCT posn
FROM
		/portfolios ptfo, ptfo.positions.values posn
WHERE ptfo.status =
		'active' AND posn.sharesOutstanding > 10000

The
		createIndex method is called with the following
		parameters:

fromClause = “/portfolios ptflo,
		ptflo.positions.values posn”;
indexedExpression =
		“posn.sharesOutstanding”;
projectionAttributes =
		“posn”;
		
			
			Extract region
			information from the fromClause.
In order to create an
			index, the fromClause must reference one and only one region using
			a regionPath, and the fromClause, indexedExpression, and
			projectionAttributes must not have any query parameters in them.
			If these restrictions do not hold, then the createIndex method
			will throw an exception.

Determine the one and only one
			region path in the fromClause.

For the above example, the
			region information from the fromClause determines that the region
			path is “/portfolios”.
			

			Transform the
			parameters.
Make the following transformation on the
			fromClause::
			
				
				Where
				it references the regionPath, substitute $1.
				

				For the working
				example, we now have the fromClause
$1 ptflo,
				ptflo.positions.values posn
			
			

			Construct and
			Excecute the Index Maintenance Query (IMQ)
A
			special Query is created for an index which we will call the Index
			Maintenance Query (IMQ). This query is constructed as follows.
			
				
				Construct
				IMQ
Using the transformed fromClause as described above
				and the other index parameters, construct the query as:
SELECT
				DISTINCT idxExpr: indexedExpression, target:
				projectionAttributes
FROM
				fromClause
Save
				this query with the index and compile into bytecodes if
				possible,as it will be re-used for index maintenance as well as
				for index creation.
For our working example, the IMQ would
				be:
SELECT DISTINCT idxExpr: 
				posn.sharesOutstanding, target:
				posn
FROM $1
				ptflo, ptflo.positions.values posn
				

				Execute
				IMQ.
Iterate through
				each RegionEntry in the region, and for each RegionEntry:
				
					
					Create
					an special instance of QRegion that contains exactly one entry,
					the current RegionEntry. Execute the IMQ using this QRegion as
					the $1 query parameter. The result of this query
					provides structs that contains the the index values and target
					values needed for the index for this region entry.
[Note:
					We need to implement a light-weight read-only Region
					implementation that has just one RegionEntry in it, and use this
					Region to construct this special QRegion instance.]
				
			
			

			Use the results to
			build the index. 
These structs can now be
			used directly to build the index by iterating over them and
			collecting the indexed values and constructing the map of
			RegionEntry=>target objects, and adding the RegionEntry to the
			reverse map.
			

			Concurrency during
			Index Creation
For
			indexes that are specified in the cache.xml, the indexes are
			created during initialization before a reference to the region is
			released to application threads.
Whenever an index is being
			created, modifications to the region by other threads must be
			blocked, i.e a local region write lock is obtained. Threads that
			only read from the region are not blocked.  [TBD – to we
			already have a local region write lock, or does the query group
			need to implement this?]
		
	
	

	Index Use while
	Executing a Query
To
	determine if an index is compatible for a particular query,
	the fromClause, indexedExpression, and projectionAttributes of an
	index must be compatible with a query as described below. In the
	future, histograms should be added to indexes along with query
	transformations cost-based estimation to the query processor so that
	more intelligent heuristics can be used to determine whether the use
	of a particular index is actually worthwhile. In 4.0, if an
	index is deemed to be compatible then it will be used. The
	algorithms described here make some simplifying assumptions and an
	index may not necessarily be used in all cases where they could be
	if the algorithms were more sophisticated. This could be improved on
	in future releases.
	
		
		Canonicalization
To
		facilitate the matching algorithms as described, later, the index
		parameters (fromClause, indexedExpression, and
		projectionAttributes) and the query being executed are first put
		into a canonicalized form so there are no variables in the query.
		Canonicalization is done as follows:
		
			
			Queries and index
			parameters are first compiled into a tree of nodes which are
			instances of CompiledValue. The term “compiled”
			 here should not be confused with byte-code compilation which is a
			lower level of compilation that will most likely not be
			implemented in this release due resource restrictions. See the
			package.html for org.apache.geode.cache.query for further
			details on byte-code compilation.
			

			Each iterator
			definition in the fromClause is assigned a placeholder that
			represents its runtime iteration. The product currently has an
			internal class that implements these placeholders, named
			RuntimeIterator. For the purpose of this document, we
			will refer to these placeholders as itr1, itr2, ..., itrN.
			All identifiers in the query and index parameters that refer to
			explicitly declared iterator variables are replaced with reference
			to these placeholders. Any implicit references to attributes or
			methods are resolved to determine which iterator it operates on,
			and these implicit references are made explicit and given a
			reference to the appropriate placeholder. 
			
			

			Remove any
			unreferenced iterator definitions in the fromClause, i.e.  any
			definitions that are not referenced anywhere else in the query or
			index parameters. Note, however, that a * projectionAttributes
			implicitly references all iterator definitions in the fromClause.
			

			Given
			the example query
SELECT
			DISTINCT posn
FROM /portfolios, positions.values posn
WHERE
			status = 'active' AND sharesOutstanding > $1
the
			canonicalized form would be a tree of CompiledValue nodes which
			could be transcribed as:
SELECT DISTINCT itr2
FROM
			/portfolios itr1, itr1.positions.values
			itr2
WHERE
			itr1.status =
			'active' AND itr2.sharesOutstanding
			> $1
		
		

		Compatible
		fromClause
The
		fromClause of an index is compatible with a query if the index
		fromClause is a sublist of the query fromClause (a sublist includes
		the case of being equivalent lists).
		

		Compatible
		projectionAttributes
The
		projectionAttributes of an index is automaticaly compatible if it
		is * (no projection), or if it is equivalent to the
		projectionAttributes in the query.
		

		Compatible
		indexedExpression
		
			
			Equivalence:
			The indexedExpression (tree) is passed into the the whereClause
			(tree) of the query using a method that calculates compatibility
			by potentially recursively visiting children nodes that represent
			subexpressions. By default, a node in the whereClause will answer
			true only if the indexedExpression is equivalent to the node
			itself. E.g., the expression element(sub_expr1)in
			the indexedExpression
			is compatible with element(sub_expr2)in
			the whereClause if and only if sub_expr1 and sub_expr2
			are equivalent.
			

			Some
			types of expression nodes, however, define compatible in other
			ways.
			
				
				An
				AND node in the
				whereClause is compatible with an indexedExpression not only if
				it is equivalent based on its unordered terms, but also if
				any of its terms are compatible.
				

				An
				OR node in this
				release is compatible only if it is equivalent based on its
				unordered terms.
				

				A
				relational expression (using one of the operators <, >, <=,
				>=, or <>) is compatible if equivalent or if both of the
				following are true:
				
					
					one
					of its terms is compatible
					

					the
					other term is a constant, i.e. is not dependent on any
					of the iterators.
				
			
		
		

		Compatible
		Index
If
		all of these compatibility tests for a query pass for a particular
		index, then the query will use that index. Note that it is possible
		for a query to use multiple compatible indexes as explained below.
		

		Query
		Evaluation
There
		are two ways a query can be evaluated, by iteration or
		“filtering”
		(for the lack of a better term).
		
			
			Iteration
A
			query is evaluated by brute-force, most likely by iteration and
			cartesian product of the collections in the fromClause if there
			are no compatible indexes and the where clause is dependent on at
			least one of the iterators. The whereClause tree is visited for
			each element in the iteration across the cartesian product, and
			those elemets for which the whereClause evaluates to true are kept
			in the result set and the projection attributes are applied to it,
			and for those elements for which the whereClause evaluates to true
			are discarded.
			

			Filtered
			Evaluation
When
			there is at least one compatible index, then the query is
			evaluated by “filtering” which means it does
			intersections or iterations on intermediate result sets obtained
			from indexes instead of the entire base collection. For filtered
			evaluation, the compiled whereClause tree is visited recursively,
			but instead of doing this for each iteration of the base
			collection, the entire result set is build up as it visits the
			nodes in the tree. An expression that is compatible with an index
			will produce a result set using that index. When combined with
			other terms in an AND expression, either other result sets will be
			intersected with each other to produce a result set for the entire
			AND expression, or some terms will produce result sets that are
			intersected and other terms that cannot use an index will be
			evaluated against the intermediate results from other terms by
			iteration, causing elements in the intermediate results to be
			dropped if they don't evaluate to TRUE for the other term(s). 
			Terms in AND expressions that use indexes should always be
			evaluated first before terms that require iteration; this
			minimizes the size of the iteration required.
			
				
				Projections
				on Indexes
When
				computing the result set from an index lookup that contains a
				projection, the RegionEntry where the target objects are derived
				from should be kept as well. This result set may need to be
				intersected with the results of another index lookup that may or
				may be projected, or may need to be iterated across with an
				expression that refers to data that is not in the projection.
				Each of these cases is described as follows.
				
					
					If
					two index lookup result sets are intersected and they both have
					the same projection or they both have no projection, then do the
					intersection normally, keeping the context information (i.e. the
					RegionEntries) in the result.
					

					It
					two index lookup result sets need to be intersected and one has
					a projection and the other does not, then apply the projection
					to the result set that does not have a projection and then do
					the intersection.
					

					If
					an index lookup result set has no projection and an expression
					is being applied through iteration, then nothing special needs
					to be done.
					

					If
					an index lookup result set does have a projection and an
					expression is being applied through iteration, then first
					determine if the expression refers to any iterator variable that
					is not available in the projection. If so, then the full
					(unprojected) base collection element(s) (i.e. the part of the
					cartesian product this entry contributes to) should be computed
					from the index results for each element using the RegionEntry
					instead of the projection retrieved from the index before the
					expression is applied to determine whether the projection(s)
					from the index should added to the intermediate results.
				
				

				Choice
				of multiple Indexes
Where
				there is an expression that is compatible with more than one
				available index, then if one index has a projection and the other
				does not, then prefer the index with the projection. In some
				cases an index for a complex expression is available as well as
				an index for a simpler expression that is part of the more
				complex expression. In this case the index for the more complex
				expression is preferred.  [to do – provide example]
			
			

			Independent
			whereClause. In the corner case where the whereClause is not
			dependent at all on any of the iterators then iteration is also
			not necessary as the result will simply be the entire projected
			base collection or the empty set. 
			
		
	
	

	Index Maintenance
	
		
		Synchronous
Synchronous
		index maintenance implies that the thread that makes a modification
		to region data does not return until the indexes for that region
		are updated. This guarantees that a thread that makes modifications
		to a region and then does a query will get results that reflect the
		changes.
		

		Asynchronous
Asynchronous
		index maintenance uses a background thread that does the index
		maintenance. Operations to update the index are queued in Runnable
		added to a QueuedExecutor and a background thread takes operations
		off the queue and updates the indexes. Although the operations
		should be done in order with respect to a particular region, we
		should try to avoid using a thread per region. One thread per
		cache, however, may not be sufficient so some compromise may need
		to be made with respect to the number of threads. The size of the
		index maintenance queue may need to be made configurable by the
		user to prevent index maintenance from lagging behind region
		updates too much. Other than the use of background threads, index
		maintenance is the same for both synchronous and asynchronous.
		

		Upon region
		modification. When a region that has indexes is modified, an
		updateIndexes call is made to the region's index manager to update
		the indexes. A reference to the RegionEntry that is being modified
		is provided, and information regarding whether the modification was
		a create, destroy, or update.  If it was an update, then including
		the “old value” is not necessary since the old data in
		the indexes is completely identified by the RegionEntry.
		
			
			Remove old data.
If
			the operation is not a create, the old data that is associated
			with the RegionEntry is removed from the indexes, using the
			reverse map of the index.
			

			Add
			new data.
If the operation is not a destroy, the new data
			is calculated and added to the index in both the forward and
			reverse maps. If it is a destroy, then skip the next section on
			computing the new data.
			

			New Data: Compute
			the new index values and target values.
Execute the IMQ
			using the same procedure  that
			was used for index creation, but only use the one created or
			updated RegionEntry to construct a QRegion. This provides the
			index values and target values to use to update the index for this
			entry.
		
		

		Concurrency
Each
		index has a ReadWriteLock.
		This lock can either be an instance of
		the backport of the JDK 1.5 ReentrantReadWriteLock can be used. The
		read lock allows multiple readers and the write lock is exclusive.
		During index maintenance, all the indexes are write-locked up
		front. When a query needs to use an index, it obtains a read lock
		on the index while it uses it.
	
	

	Futures
	
		
		Improved Concurrency
		During Region Maintenance
		
			
			Multiple Reader,
			Multiple Writer ReadWriteLock
<TBD>