com.sindicetech.siren.search.node.package-info Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of siren-core Show documentation
SIREn core module
The newest version!
/**
 * Programmatic API to search node-based inverted indexes.
 *
 * Introduction
 *
 * This package contains the API for building queries to search JSON data
 * over node-based inverted indexes. For an introduction about the Lucene's
 * search API, see the {@link org.apache.lucene.search} package documentation.
 *
 * Search Basics
 *
 * In contrast to the Lucene's {@link org.apache.lucene.search.Query} API
 * which provides complex querying capabilities to search for documents, SIREn
 * provide a {@link com.sindicetech.siren.search.node.NodeQuery} API to provide
 * complex querying capabilities to search for nodes and documents. The
 * information retrieved not only consists of the matching documents, but also
 * of the matching nodes within these documents.
 *
 * 
 *
 * SIREn offers a wide variety of
 * {@link com.sindicetech.siren.search.node.NodeQuery} implementations. Most of them
 * are similar to the ones provided by the Lucene's
 * {@link org.apache.lucene.search.Query} API. For example, while Lucene
 * provides a {@link org.apache.lucene.search.TermQuery} implementation
 * to search documents that contain a specific term, SIREn provides a {@link
 * com.sindicetech.siren.search.node.NodeTermQuery} implementation to search nodes
 * and documents that contain a specific term.
 *
 * 
Level and Range Constraints
 *
 * The {@link com.sindicetech.siren.search.node.NodeQuery} provides methods to set
 * constraints on the nodes matched by the query. There are two types of
 * constraints:
 * 
 *    Level constraint: this constraint will filter out all nodes that do
 *   not belong to the specified level of the tree.
 *   
 Interval constraint: this constraint will filter out all nodes in
 *   which the last integer of their dewey code vector is not contained in the
 *   specified interval.
 * 
 *
 * Query Classes
 *
 * {@link com.sindicetech.siren.search.node.NodeTermQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodeTermQuery} matches all the
 * nodes that contain the specified {@link org.apache.lucene.index.Term},
 * which is a word that occurs in a certain
 * {@link org.apache.lucene.document.Field} containing JSON data.
 * 
 * Constructing a {@link com.sindicetech.siren.search.node.NodeTermQuery} is as
 * simple as:
 * 
 *      NodeTermQuery tq = new NodeTermQuery(new Term("json-field", "term"));
 * 
 *
 * In this example, the {@link com.sindicetech.siren.search.node.NodeQuery}
 * identifies all {@link org.apache.lucene.document.Document}s that have the
 * {@link org.apache.lucene.document.Field} named "json-field"
 * where a node contains the word "term".
 *
 * {@link com.sindicetech.siren.search.node.NodePhraseQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodePhraseQuery} matches all the nodes
 * containing the specified phrase. A phrase is defined as a sequence of
 * {@link org.apache.lucene.index.Term}.
 *
 * {@link com.sindicetech.siren.search.node.NodeBooleanQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodeBooleanQuery} matches all the
 * nodes containing the specified boolean combination of queries.
 * A {@link com.sindicetech.siren.search.node.NodeBooleanQuery} contains multiple
 * {@link com.sindicetech.siren.search.node.NodeBooleanClause}s, where each clause
 * contains a sub-query
 * ({@link com.sindicetech.siren.search.node.NodeQuery} instance) and an
 * operator (from {@link com.sindicetech.siren.search.node.NodeBooleanClause.Occur})
 * describing how that sub-query is combined with the other clauses. The
 * semantic of {@link com.sindicetech.siren.search.node.NodeBooleanClause.Occur} is
 * identical to the semantic of {@link org.apache.lucene.search.BooleanClause.Occur}.
 *
 * {@link com.sindicetech.siren.search.node.NodeTermRangeQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodeTermRangeQuery} matches all
 * nodes containing a term that occurs in the inclusive or exclusive range of a
 * lower {@link org.apache.lucene.index.Term Term} and an upper
 * {@link org.apache.lucene.index.Term Term} according to
 * {@link org.apache.lucene.index.TermsEnum#getComparator TermsEnum.getComparator()}.
 * It is not intended for numerical ranges; use
 * {@link com.sindicetech.siren.search.node.NodeNumericRangeQuery} instead.
 *
 * {@link com.sindicetech.siren.search.node.NodeNumericRangeQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodeNumericRangeQuery} matches all
 * nodes containing a value that occurs in a numeric range. For
 * NodeNumericRangeQuery to work, you must index the values with the datatypes
 * configured with the appropriate numeric analyzers
 * ({@link com.sindicetech.siren.analysis.NumericAnalyzer}).
 *
 * {@link com.sindicetech.siren.search.node.NodePrefixQuery},
 *     {@link com.sindicetech.siren.search.node.NodeWildcardQuery},
 *     {@link com.sindicetech.siren.search.node.NodeRegexpQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodePrefixQuery} matches all nodes
 * containing terms that begin with the specified string. A
 * {@link com.sindicetech.siren.search.node.NodeWildcardQuery} generalizes this
 * by allowing for the use of + (matches 1 or more characters),
 * * (matches 0 or more characters) and
 * ? (matches exactly one character) wildcards. Note that the
 * {@link com.sindicetech.siren.search.node.NodeWildcardQuery} can be quite slow. Also
 * note that {@link com.sindicetech.siren.search.node.NodeWildcardQuery} should
 * not start with +, * and ?, as these are extremely slow.
 * Some QueryParsers may not allow this by default, but provide a
 * setAllowLeadingWildcard method to remove that protection.
 * The {@link com.sindicetech.siren.search.node.NodeRegexpQuery} is even more
 * general than NodeWildcardQuery, matching all nodes with terms that match a
 * regular expression pattern.
 *
 * {@link com.sindicetech.siren.search.node.NodeFuzzyQuery}
 *
 * A {@link com.sindicetech.siren.search.node.NodeFuzzyQuery} matches nodes that
 * contain terms similar to the specified term. Similarity is determined using
 * Levenshtein (edit)
 * distance.
 *
 * {@link com.sindicetech.siren.search.node.TwigQuery}
 *
 * A {@link com.sindicetech.siren.search.node.TwigQuery} enables to combine
 * {@link com.sindicetech.siren.search.node.NodeQuery}s with a Parent-Child or
 * Ancestor-Descendant relation. This is the basic building block to build
 * tree-shaped queries.
 *
 * 
 *
 * A {@link com.sindicetech.siren.search.node.TwigQuery} is composed of a root and
 * of one or more children or descendants:
 * 

 *   The root is a {@link com.sindicetech.siren.search.node.NodeQuery} instance.
 *       An empty root is considered as a wildcard node query and will match all
 *       nodes. We call "root nodes" the set of nodes that are retrieved by the
 *       root query.
 *  
 A descendant is a {@link com.sindicetech.siren.search.node.NodeQuery}
 *       associated to an operator (from
 *       {@link com.sindicetech.siren.search.node.NodeBooleanClause.Occur}). A
 *       descendant query will match all the nodes for which it exists a path
 *       to a root node. A descendant is associated to a node level, which
 *       corresponds to the relative distance (in term of levels) from the root.
 *  
 A child is a descendant that is exactly one level above the root level.
 * 
 *
 * 
 *
 * A twig query is always associated to a level. If no level is specified, then
 * by default the level is set to 1. When a twig query is used as a child or
 * descendant of another twig query, then its level is automatically updated
 * according to the level of the parent twig query. For example, given
 * the following instructions:
 * 
 *      TwigQuery tw1 = new TwigQuery();
 *      TwigQuery tw2 = new TwigQuery();
 *      tw1.addChild(tw2, Occur.MUST);
 * 
 *
 * In this example, the first twig query tw1 is defined at the default
 * level 1. The second twig query tw2, after the call to
 * {@link com.sindicetech.siren.search.node.TwigQuery#addChild(NodeQuery, com.sindicetech.siren.search.node.NodeBooleanClause.Occur)},
 * will have its level updated to 2 since it is now a child of a twig query at a
 * level 1.
 *
 * The Scorer Class
 *
 * The {@link com.sindicetech.siren.search.node.NodeScorer} abstract class provides
 * common scoring functionality for all the node scorer implementations which
 * are the heart of the SIREn scoring process.
 *
 * 
 *
 * The implementation of the query processing framework follows a node-at-a-time
 * approach, where the query operators (i.e., {@link com.sindicetech.siren.search.node.NodeScorer})
 * process one node at a time. The query processing framework has been
 * designed for high efficiency processing:
 * 

 *    All the query operators leverage has much as possible the lazy-loading
 *   feature of the
 *   {@link com.sindicetech.siren.index.codecs.siren10.Siren10PostingsReader}. For
 *   example, there is not the concept of next matching document (i.e.,
 *   {@link org.apache.lucene.search.Scorer#nextDoc()}) in the
 *   {@link NodeScorer} interface, but instead the concept of next candidate
 *   document (i.e.,
 *   {@link com.sindicetech.siren.search.node.NodeScorer#nextCandidateDocument()}).
 *   This enables {@link com.sindicetech.siren.search.node.NodeConjunctionScorer} to
 *   efficiently iterates over the document identifiers wihtout having to
 *   decode the node labels until a potential candidate is found.
 *   
 The node label array (i.e., {@link org.apache.lucene.util.IntsRef})
 *   being processed is the same in all the query operators, which means that
 *   the same array is reused across and no new arrays are created during the
 *   query processing.
 *   
 The node label array is itself a slice of the array of the
 *   uncompressed node block. The node label array is created by sliding a
 *   window (i.e., {@link org.apache.lucene.util.IntsRef}) over the array of the
 *   uncompressed node block.
 * 
 *
 */
package com.sindicetech.siren.search.node;
com.sindicetech.siren.search.node.package-info Maven / Gradle / Ivy

Introduction

Search Basics

Level and Range Constraints

Query Classes

{@link com.sindicetech.siren.search.node.NodeTermQuery}

{@link com.sindicetech.siren.search.node.NodePhraseQuery}

{@link com.sindicetech.siren.search.node.NodeBooleanQuery}

{@link com.sindicetech.siren.search.node.NodeTermRangeQuery}

{@link com.sindicetech.siren.search.node.NodeNumericRangeQuery}

{@link com.sindicetech.siren.search.node.NodePrefixQuery}, * {@link com.sindicetech.siren.search.node.NodeWildcardQuery}, * {@link com.sindicetech.siren.search.node.NodeRegexpQuery}

{@link com.sindicetech.siren.search.node.NodeFuzzyQuery}

{@link com.sindicetech.siren.search.node.TwigQuery}

The Scorer Class