org.apache.lucene.search.join.package-info Maven / Gradle / Ivy
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Support for index-time and query-time joins.
* Index-time joins
*
* The index-time joining support joins while searching, where joined
* documents are indexed as a single document block using
* {@link org.apache.lucene.index.IndexWriter#addDocuments IndexWriter.addDocuments()}.
* This is useful for any normalized content (XML documents or database tables). In database terms, all rows for all
* joined tables matching a single row of the primary table must be
* indexed as a single document block, with the parent document
* being last in the group.
*
* When you index in this way, the documents in your index are divided
* into parent documents (the last document of each block) and child
* documents (all others). You provide a {@link org.apache.lucene.search.join.BitSetProducer} that identifies the
* parent documents, as Lucene does not currently record any information
* about doc blocks.
*
* At search time, use {@link
* org.apache.lucene.search.join.ToParentBlockJoinQuery} to remap/join
* matches from any child {@link org.apache.lucene.search.Query} (ie, a
* query that matches only child documents) up to the parent document
* space. The
* resulting query can then be used as a clause in any query that
* matches parent.
*
* If you care about what child documents matched for each parent document,
* then use the {@link org.apache.lucene.search.join.ParentChildrenBlockJoinQuery} query to
* per matched parent document retrieve the child documents that caused to match the
* parent document in first place. This query should be used after your main query
* has been executed. For each hit execute the the
* {@link org.apache.lucene.search.join.ParentChildrenBlockJoinQuery} query
*
* TopDocs results = searcher.search(mainQuery, 10);
* for (int i = 0; i < results.scoreDocs.length; i++) {
* ScoreDoc scoreDoc = results.scoreDocs[i];
*
* // Run ParentChildrenBlockJoinQuery to figure out the top matching child docs:
* ParentChildrenBlockJoinQuery parentChildrenBlockJoinQuery =
* new ParentChildrenBlockJoinQuery(parentFilter, childQuery, scoreDoc.doc);
* TopDocs topChildResults = searcher.search(parentChildrenBlockJoinQuery, 3);
* // Process top child hits...
* }
*
*
* To map/join in the opposite direction, use {@link
* org.apache.lucene.search.join.ToChildBlockJoinQuery}. This wraps
* any query matching parent documents, creating the joined query
* matching only child documents.
*
*
Query-time joins
*
*
* The query time joining is index term based and implemented as two pass search. The first pass collects all the terms from a fromField
* that match the fromQuery. The second pass returns all documents that have matching terms in a toField to the terms
* collected in the first pass.
*
* Query time joining has the following input:
*
* fromField
: The from field to join from.
* fromQuery
: The query executed to collect the from terms. This is usually the user specified query.
* multipleValuesPerDocument
: Whether the fromField contains more than one value per document
* scoreMode
: Defines how scores are translated to the other join side. If you don't care about scoring
* use {@link org.apache.lucene.search.join.ScoreMode#None} mode. This will disable scoring and is therefore more
* efficient (requires less memory and is faster).
* toField
: The to field to join to
*
*
* Basically the query-time joining is accessible from one static method. The user of this method supplies the method
* with the described input and a IndexSearcher
where the from terms need to be collected from. The returned
* query can be executed with the same IndexSearcher
, but also with another IndexSearcher
.
* Example usage of the {@link org.apache.lucene.search.join.JoinUtil#createJoinQuery(String, boolean, String, org.apache.lucene.search.Query, org.apache.lucene.search.IndexSearcher, org.apache.lucene.search.join.ScoreMode)
* JoinUtil.createJoinQuery()} :
*
*
* String fromField = "from"; // Name of the from field
* boolean multipleValuesPerDocument = false; // Set only to true in the case when your fromField has multiple values per document in your index
* String toField = "to"; // Name of the to field
* ScoreMode scoreMode = ScoreMode.Max; // Defines how the scores are translated into the other side of the join.
* Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
*
* Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
* TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
* // Render topDocs...
*
*/
package org.apache.lucene.search.join;