All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.lucene.search.package-info Maven / Gradle / Ivy

There is a newer version: 6.4.2_1
Show newest version
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * Code to search indices.
 *
 * 

Table Of Contents

* *
    *
  1. Search Basics *
  2. The Query Classes *
  3. Scoring: Introduction *
  4. Scoring: Basics *
  5. Changing the Scoring *
  6. Appendix: Search Algorithm *
* * * *

Search Basics

* *

Lucene offers a wide variety of {@link org.apache.lucene.search.Query} implementations, most * of which are in this package or the queries * module. These implementations can be combined in a wide variety of ways to provide complex * querying capabilities along with information about where matches took place in the document * collection. The Query Classes section below highlights some of the more * important Query classes. For details on implementing your own Query class, see Custom Queries -- Expert Level below. * *

Make sure to look at {@link org.apache.lucene.search.Query} factory methods on {@link * org.apache.lucene.index.IndexableField}s that you feed into the index writer, they are convenient * to use and sometimes more efficient than a naively constructed {@link * org.apache.lucene.search.Query}. See {@link * org.apache.lucene.document.LongField#newRangeQuery(String, long, long)} for instance. * *

To perform a search, applications usually call {@link * org.apache.lucene.search.IndexSearcher#search(Query,int)}. * *

Once a Query has been created and submitted to the {@link * org.apache.lucene.search.IndexSearcher IndexSearcher}, the scoring process begins. After some * infrastructure setup, control finally passes to the {@link org.apache.lucene.search.Weight * Weight} implementation and its {@link org.apache.lucene.search.Scorer Scorer} or {@link * org.apache.lucene.search.BulkScorer BulkScorer} instances. See the Algorithm section for more notes on the process. * * * * *

Query Classes

* *

{@link org.apache.lucene.search.TermQuery TermQuery}

* *

Of the various implementations of {@link org.apache.lucene.search.Query Query}, the {@link * org.apache.lucene.search.TermQuery TermQuery} is the easiest to understand and the most often * used in applications. A {@link org.apache.lucene.search.TermQuery TermQuery} matches all the * documents that contain the specified {@link org.apache.lucene.index.Term Term}, which is a word * that occurs in a certain {@link org.apache.lucene.document.Field Field}. Thus, a {@link * org.apache.lucene.search.TermQuery TermQuery} identifies and scores all {@link * org.apache.lucene.document.Document Document}s that have a {@link * org.apache.lucene.document.Field Field} with the specified string in it. Constructing a {@link * org.apache.lucene.search.TermQuery TermQuery} is as simple as: * *

 * TermQuery tq = new TermQuery(new Term("fieldName", "term"));
 * 
* * In this example, the {@link org.apache.lucene.search.Query Query} identifies all {@link * org.apache.lucene.document.Document Document}s that have the {@link * org.apache.lucene.document.Field Field} named "fieldName" containing the word * "term". * *

{@link org.apache.lucene.search.BooleanQuery BooleanQuery}

* *

Things start to get interesting when one combines multiple {@link * org.apache.lucene.search.TermQuery TermQuery} instances into a {@link * org.apache.lucene.search.BooleanQuery BooleanQuery}. A {@link * org.apache.lucene.search.BooleanQuery BooleanQuery} contains multiple {@link * org.apache.lucene.search.BooleanClause BooleanClause}s, where each clause contains a sub-query * ({@link org.apache.lucene.search.Query Query} instance) and an operator (from {@link * org.apache.lucene.search.BooleanClause.Occur BooleanClause.Occur}) describing how that sub-query * is combined with the other clauses: * *

    *
  1. *

    {@link org.apache.lucene.search.BooleanClause.Occur#SHOULD SHOULD} — Use this * operator when a clause can occur in the result set, but is not required. If a query is made * up of all SHOULD clauses, then every document in the result set matches at least one of * these clauses. *

  2. *

    {@link org.apache.lucene.search.BooleanClause.Occur#MUST MUST} — Use this operator * when a clause is required to occur in the result set and should contribute to the score. * Every document in the result set will match all such clauses. *

  3. *

    {@link org.apache.lucene.search.BooleanClause.Occur#FILTER FILTER} — Use this * operator when a clause is required to occur in the result set but should not contribute to * the score. Every document in the result set will match all such clauses. *

  4. *

    {@link org.apache.lucene.search.BooleanClause.Occur#MUST_NOT MUST NOT} — Use this * operator when a clause must not occur in the result set. No document in the result set will * match any such clauses. *

* * Boolean queries are constructed by adding two or more {@link * org.apache.lucene.search.BooleanClause BooleanClause} instances. If too many clauses are added, a * {@link org.apache.lucene.search.IndexSearcher.TooManyClauses TooManyClauses} exception will be * thrown during searching. This most often occurs when a {@link org.apache.lucene.search.Query * Query} is rewritten into a {@link org.apache.lucene.search.BooleanQuery BooleanQuery} with many * {@link org.apache.lucene.search.TermQuery TermQuery} clauses, for example by {@link * org.apache.lucene.search.WildcardQuery WildcardQuery}. The default setting for the maximum number * of clauses is 1024, but this can be changed via the static method {@link * org.apache.lucene.search.IndexSearcher#setMaxClauseCount(int)}. * *

Phrases

* *

Another common search is to find documents containing certain phrases. This is handled in * different ways: * *

    *
  1. *

    {@link org.apache.lucene.search.PhraseQuery PhraseQuery} — Matches a sequence of * {@link org.apache.lucene.index.Term Term}s. {@link org.apache.lucene.search.PhraseQuery * PhraseQuery} uses a slop factor to determine how many positions may occur between any two * terms in the phrase and still be considered a match. The slop is 0 by default, meaning the * phrase must match exactly. *

  2. *

    {@link org.apache.lucene.search.MultiPhraseQuery MultiPhraseQuery} — A more * general form of PhraseQuery that accepts multiple Terms for a position in the phrase. For * example, this can be used to perform phrase queries that also incorporate synonyms. *

  3. *

    Interval queries in the Queries * module *

* *

{@link org.apache.lucene.search.PointRangeQuery PointRangeQuery}

* *

The {@link org.apache.lucene.search.PointRangeQuery PointRangeQuery} matches all documents * that occur in a numeric range. For PointRangeQuery to work, you must index the values using a one * of the numeric fields ({@link org.apache.lucene.document.IntPoint IntPoint}, {@link * org.apache.lucene.document.LongPoint LongPoint}, {@link org.apache.lucene.document.FloatPoint * FloatPoint}, or {@link org.apache.lucene.document.DoublePoint DoublePoint}). * *

{@link org.apache.lucene.search.PrefixQuery PrefixQuery}, {@link * org.apache.lucene.search.WildcardQuery WildcardQuery}, {@link * org.apache.lucene.search.RegexpQuery RegexpQuery}

* *

While the {@link org.apache.lucene.search.PrefixQuery PrefixQuery} has a different * implementation, it is essentially a special case of the {@link * org.apache.lucene.search.WildcardQuery WildcardQuery}. The {@link * org.apache.lucene.search.PrefixQuery PrefixQuery} allows an application to identify all documents * with terms that begin with a certain string. The {@link org.apache.lucene.search.WildcardQuery * WildcardQuery} generalizes this by allowing for the use of * (matches 0 or more * characters) and ? (matches exactly one character) wildcards. Note that the {@link * org.apache.lucene.search.WildcardQuery WildcardQuery} can be quite slow. Also note that {@link * org.apache.lucene.search.WildcardQuery WildcardQuery} should not start with * and * ?, as these are extremely slow. Some QueryParsers may not allow this by default, but * provide a setAllowLeadingWildcard method to remove that protection. The {@link * org.apache.lucene.search.RegexpQuery RegexpQuery} is even more general than WildcardQuery, * allowing an application to identify all documents with terms that match a regular expression * pattern. * *

{@link org.apache.lucene.search.FuzzyQuery FuzzyQuery}

* *

A {@link org.apache.lucene.search.FuzzyQuery FuzzyQuery} matches documents that contain terms * similar to the specified term. Similarity is determined using Levenshtein distance. This type of * query can be useful when accounting for spelling variations in the collection. * *

* *

Scoring — Introduction

* *

Lucene scoring is the heart of why we all love Lucene. It is blazingly fast and it hides * almost all of the complexity from the user. In a nutshell, it works. At least, that is, until it * doesn't work, or doesn't work as one would expect it to work. Then we are left digging into * Lucene internals or asking for help on [email protected] to figure out why * a document with five of our query terms scores lower than a different document with only one of * the query terms. * *

While this document won't answer your specific scoring issues, it will, hopefully, point you * to the places that can help you figure out the what and why of Lucene scoring. * *

Lucene scoring supports a number of pluggable information retrieval models, including: * *

* * These models can be plugged in via the {@link org.apache.lucene.search.similarities Similarity * API}, and offer extension hooks and parameters for tuning. In general, Lucene first finds the * documents that need to be scored based on boolean logic in the Query specification, and then * ranks this subset of matching documents via the retrieval model. For some valuable references on * VSM and IR in general refer to Lucene Wiki IR * references. * *

The rest of this document will cover Scoring basics and explain * how to change your {@link org.apache.lucene.search.similarities.Similarity Similarity}. Next, it * will cover ways you can customize the lucene internals in Custom * Queries -- Expert Level, which gives details on implementing your own {@link * org.apache.lucene.search.Query Query} class and related functionality. Finally, we will finish up * with some reference material in the Appendix. * *

* *

Scoring — Basics

* *

Scoring is very much dependent on the way documents are indexed, so it is important to * understand indexing. (see Lucene * overview before continuing on with this section) Be sure to use the useful {@link * org.apache.lucene.search.IndexSearcher#explain(org.apache.lucene.search.Query, int) * IndexSearcher.explain(Query, doc)} to understand how the score for a certain matching document * was computed. * *

Generally, the Query determines which documents match (a binary decision), while the * Similarity determines how to assign scores to the matching documents. * *

Fields and Documents

* *

In Lucene, the objects we are scoring are {@link org.apache.lucene.document.Document * Document}s. A Document is a collection of {@link org.apache.lucene.document.Field Field}s. Each * Field has {@link org.apache.lucene.document.FieldType semantics} about how it is created and * stored ({@link org.apache.lucene.document.FieldType#tokenized() tokenized}, {@link * org.apache.lucene.document.FieldType#stored() stored}, etc). It is important to note that Lucene * scoring works on Fields and then combines the results to return Documents. This is important * because two Documents with the exact same content, but one having the content in two Fields and * the other in one Field may return different scores for the same query due to length * normalization. * *

Score Boosting

* *

Lucene allows influencing the score contribution of various parts of the query by wrapping * with {@link org.apache.lucene.search.BoostQuery}. * *

Changing Scoring — Similarity

* *

Changing the scoring formula

* *

Changing {@link org.apache.lucene.search.similarities.Similarity Similarity} is an easy way to * influence scoring, this is done at index-time with {@link * org.apache.lucene.index.IndexWriterConfig#setSimilarity(org.apache.lucene.search.similarities.Similarity) * IndexWriterConfig.setSimilarity(Similarity)} and at query-time with {@link * org.apache.lucene.search.IndexSearcher#setSimilarity(org.apache.lucene.search.similarities.Similarity) * IndexSearcher.setSimilarity(Similarity)}. Be sure to use search-time similarities that encode the * length normalization factor the same way as the similarity that you used at index time. All * Lucene built-in similarities use the default encoding so they are compatible, but if you use a * custom similarity that changes the encoding of the length normalization factor, you are on your * own: Lucene makes no effort to ensure that the index-time and the search-time similarities are * compatible. * *

You can influence scoring by configuring a different built-in Similarity implementation, or by * tweaking its parameters, subclassing it to override behavior. Some implementations also offer a * modular API which you can extend by plugging in a different component (e.g. term frequency * normalizer). * *

Finally, you can extend the low level {@link org.apache.lucene.search.similarities.Similarity * Similarity} directly to implement a new retrieval model. * *

See the {@link org.apache.lucene.search.similarities} package documentation for information on * the built-in available scoring models and extending or changing Similarity. * *

Scoring multiple fields

* *

In the real world, documents often have multiple fields with different degrees of relevance. A * robust way of scoring across multiple fields is called BM25F, which is implemented via {@link * org.apache.lucene.search.CombinedFieldQuery}. It scores documents with multiple fields as if * their content had been indexed in a single combined field. It supports configuring per-field * boosts where the value of the boost is interpreted as the number of times that the content of the * field exists in the virtual combined field. * *

Here is an example that constructs a query on "apache OR lucene" on fields "title" with a * boost of 10, and "body" with a boost of 1: * *

 * BooleanQuery.Builder builder = new BooleanQuery.Builder();
 * for (String term : new String[] { "apache", "lucene" }) {
 *   Query query = new CombinedFieldQuery(term)
 *         .addField("title", 10f)
 *         .addField("body", 1f)
 *         .build();
 *   builder.add(query, Occur.SHOULD);
 * }
 * Query query = builder.build();
 * 
* *

Integrating field values into the score

* *

While similarities help score a document relatively to a query, it is also common for * documents to hold features that measure the quality of a match. Such features are best integrated * into the score by indexing a {@link org.apache.lucene.document.FeatureField FeatureField} with * the document at index-time, and then combining the similarity score and the feature score using a * linear combination. For instance the below query matches the same documents as {@code * originalQuery} and computes scores as {@code similarityScore + 0.7 * featureScore}: * *

 * Query originalQuery = new BooleanQuery.Builder()
 *     .add(new TermQuery(new Term("body", "apache")), Occur.SHOULD)
 *     .add(new TermQuery(new Term("body", "lucene")), Occur.SHOULD)
 *     .build();
 * Query featureQuery = FeatureField.newSaturationQuery("features", "pagerank");
 * Query query = new BooleanQuery.Builder()
 *     .add(originalQuery, Occur.MUST)
 *     .add(new BoostQuery(featureQuery, 0.7f), Occur.SHOULD)
 *     .build();
 * 
* *

A less efficient yet more flexible way of modifying scores is to index scoring features into * doc-value fields and then combine them with the similarity score using a FunctionScoreQuery * from the queries module. For instance * the below example shows how to compute scores as {@code similarityScore * Math.log(popularity)} * using the expressions module and * assuming that values for the {@code popularity} field have been set in a {@link * org.apache.lucene.document.NumericDocValuesField NumericDocValuesField} at index time: * *

 *   // compile an expression:
 *   Expression expr = JavascriptCompiler.compile("_score * ln(popularity)");
 *
 *   // SimpleBindings just maps variables to DoubleValuesSource instances
 *   SimpleBindings bindings = new SimpleBindings();
 *   bindings.add("_score", DoubleValuesSource.SCORES);
 *   bindings.add("popularity", DoubleValuesSource.fromIntField("popularity"));
 *
 *   // create a query that matches based on 'originalQuery' but
 *   // scores using expr
 *   Query query = new FunctionScoreQuery(
 *       originalQuery,
 *       expr.getDoubleValuesSource(bindings));
 * 
* * * *

Multi-stage retrieval pipelines

* *

The above explains how to influence the score when evaluating all matches of the query. This * is expensive by design since it applies to all matches of the query, which could be millions. In * order to apply more sophisticated ranking logic, a good approach consists of having a retrieval * pipeline that runs a simple candidate retrieval stage that retrieves e.g. 1,000 hits, followed by * a more sophisticated reranking stage that reranks these 1,000 hits to select the best 100 hits * among them. Since the number of hits that this retrieval stage needs to operate on is bounded, it * allows it to be more sophisticated. * *

Lucene exposes reranking via the {@link org.apache.lucene.search.Rescorer} abstract class, * which has two main sub-classes: * *

    *
  • {@link org.apache.lucene.search.QueryRescorer}, to rescore using a query. For instance, the * query string could be parsed as phrase query using {@link * org.apache.lucene.util.QueryBuilder#createPhraseQuery} instead of a boolean query in order * to help boost hits which also match the query string as a phrase. *
  • {@link org.apache.lucene.search.SortRescorer}, to rescore using a {@link * org.apache.lucene.search.Sort}. For instance, the best 1,000 hits by BM25 score may be * sorted by descending popularity in order to compute the final top-100 hits. *
* *

Top hits fusion

* *

Sometimes, multiple retrieval pipelines may make sense, having their own pros and cons. A * typical example would be a lexical retrieval pipeline, matching exactly what the user requested, * and a semantic retrieval pipeline, matching documents that are closest to the user's query from a * semantic perspective. Combining scores is hazardous as different retrieval pipelines often * produce scores that not only have different ranges, but also different distributions within this * range. A robust way of combining multiple retrieval pipelines consists of combining the top hits * that they produce through their ranks rather than through their scores using reciprocal rank * fusion. This is exposed via {@link org.apache.lucene.search.TopDocs#rrf(int topN, int k, * TopDocs[] hits)}. * *

Custom Queries — Expert Level

* *

Custom queries are an expert level task, so tread carefully and be prepared to share your code * if you want help. * *

With the warning out of the way, it is possible to change a lot more than just the Similarity * when it comes to matching and scoring in Lucene. Lucene's search is a complex mechanism that is * grounded by three main classes: * *

    *
  1. {@link org.apache.lucene.search.Query Query} — The abstract object representation of * the user's information need. *
  2. {@link org.apache.lucene.search.Weight Weight} — A specialization of a Query for a * given index. This typically associates a Query object with index statistics that are later * used to compute document scores. *
  3. {@link org.apache.lucene.search.Scorer Scorer} — The core class of the scoring * process: for a given segment, scorers return {@link * org.apache.lucene.search.Scorer#iterator iterators} over matches and give a way to compute * the {@link org.apache.lucene.search.Scorer#score score} of these matches. *
  4. {@link org.apache.lucene.search.BulkScorer BulkScorer} — An abstract class that * scores a range of documents. A default implementation simply iterates through the hits from * {@link org.apache.lucene.search.Scorer Scorer}, but some queries such as {@link * org.apache.lucene.search.BooleanQuery BooleanQuery} have more efficient implementations. *
* * Details on each of these classes, and their children, can be found in the subsections below. * *

The Query Class

* *

In some sense, the {@link org.apache.lucene.search.Query Query} class is where it all begins. * Without a Query, there would be nothing to score. Furthermore, the Query class is the catalyst * for the other scoring classes as it is often responsible for creating them or coordinating the * functionality between them. The {@link org.apache.lucene.search.Query Query} class has several * methods that are important for derived classes: * *

    *
  1. {@link org.apache.lucene.search.Query#createWeight(IndexSearcher,ScoreMode,float) * createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)} — A {@link * org.apache.lucene.search.Weight Weight} is the internal representation of the Query, so * each Query implementation must provide an implementation of Weight. See the subsection on * The Weight Interface below for details on implementing the * Weight interface. *
  2. {@link org.apache.lucene.search.Query#rewrite(IndexSearcher) rewrite(IndexReader reader)} * — Rewrites queries into primitive queries. Primitive queries are: {@link * org.apache.lucene.search.TermQuery TermQuery}, {@link org.apache.lucene.search.BooleanQuery * BooleanQuery}, and other queries that implement {@link * org.apache.lucene.search.Query#createWeight(IndexSearcher,ScoreMode,float) * createWeight(IndexSearcher searcher,ScoreMode scoreMode, float boost)} *
* * * *

The Weight Interface

* *

The {@link org.apache.lucene.search.Weight Weight} interface provides an internal * representation of the Query so that it can be reused. Any {@link * org.apache.lucene.search.IndexSearcher IndexSearcher} dependent state should be stored in the * Weight implementation, not in the Query class. The interface defines four main methods: * *

    *
  1. {@link org.apache.lucene.search.Weight#scorer scorer()} — Construct a new {@link * org.apache.lucene.search.Scorer Scorer} for this Weight. See The * Scorer Class below for help defining a Scorer. As the name implies, the Scorer is * responsible for doing the actual scoring of documents given the Query. *
  2. {@link org.apache.lucene.search.Weight#explain(org.apache.lucene.index.LeafReaderContext, * int) explain(LeafReaderContext context, int doc)} — Provide a means for explaining * why a given document was scored the way it was. Typically a weight such as TermWeight that * scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will make * use of the Similarity's implementation: {@link * org.apache.lucene.search.similarities.Similarity.SimScorer#explain(Explanation, long) * SimScorer#explain(Explanation freq, long norm)}. *
  3. {@link org.apache.lucene.search.Weight#matches matches(LeafReaderContext context, int doc)} * — Give information about positions and offsets of matches. This is typically useful * to implement highlighting. *
* * * *

The Scorer Class

* *

The {@link org.apache.lucene.search.Scorer Scorer} abstract class provides common scoring * functionality for all Scorer implementations and is the heart of the Lucene scoring process. The * Scorer defines the following methods which must be implemented: * *

    *
  1. {@link org.apache.lucene.search.Scorer#iterator iterator()} — Return a {@link * org.apache.lucene.search.DocIdSetIterator DocIdSetIterator} that can iterate over all * document that matches this Query. *
  2. {@link org.apache.lucene.search.Scorer#docID docID()} — Returns the id of the {@link * org.apache.lucene.document.Document Document} that contains the match. *
  3. {@link org.apache.lucene.search.Scorer#score score()} — Return the score of the * current document. This value can be determined in any appropriate way for an application. * For instance, the {@link org.apache.lucene.search.TermScorer TermScorer} simply defers to * the configured Similarity: {@link * org.apache.lucene.search.similarities.Similarity.SimScorer#score(float, long) * SimScorer.score(float freq, long norm)}. *
  4. {@link org.apache.lucene.search.Scorer#getChildren getChildren()} — Returns any child * subscorers underneath this scorer. This allows for users to navigate the scorer hierarchy * and receive more fine-grained details on the scoring process. *
* * * *

The BulkScorer Class

* *

The {@link org.apache.lucene.search.BulkScorer BulkScorer} scores a range of documents. There * is only one abstract method: * *

    *
  1. {@link * org.apache.lucene.search.BulkScorer#score(org.apache.lucene.search.LeafCollector,org.apache.lucene.util.Bits,int,int) * score(LeafCollector,Bits,int,int)} — Score all documents up to but not including the * specified max document. *
* *

Why would I want to add my own Query?

* *

In a nutshell, you want to add your own custom Query implementation when you think that * Lucene's aren't appropriate for the task that you want to do. You might be doing some cutting * edge research or you need more information back out of Lucene (similar to Doug adding SpanQuery * functionality). * * * *

Appendix: Search Algorithm

* *

This section is mostly notes on stepping through the Scoring process and serves as fertilizer * for the earlier sections. * *

In the typical search application, a {@link org.apache.lucene.search.Query Query} is passed to * the {@link org.apache.lucene.search.IndexSearcher IndexSearcher}, beginning the scoring process. * *

Once inside the IndexSearcher, a {@link org.apache.lucene.search.Collector Collector} is used * for the scoring and sorting of the search results. These important objects are involved in a * search: * *

    *
  1. The {@link org.apache.lucene.search.Weight Weight} object of the Query. The Weight object * is an internal representation of the Query that allows the Query to be reused by the * IndexSearcher. *
  2. The IndexSearcher that initiated the call. *
  3. A {@link org.apache.lucene.search.Sort Sort} object for specifying how to sort the results * if the standard score-based sort method is not desired. *
* *

Assuming we are not sorting (since sorting doesn't affect the raw Lucene score), we call one * of the search methods of the IndexSearcher, passing in the {@link org.apache.lucene.search.Weight * Weight} object created by {@link * org.apache.lucene.search.IndexSearcher#createWeight(org.apache.lucene.search.Query,ScoreMode,float) * IndexSearcher.createWeight(Query,ScoreMode,float)} and the number of results we want. This method * returns a {@link org.apache.lucene.search.TopDocs TopDocs} object, which is an internal * collection of search results. The IndexSearcher creates a {@link * org.apache.lucene.search.TopScoreDocCollector TopScoreDocCollector} and passes it along with the * Weight to another expert search method (for more on the {@link org.apache.lucene.search.Collector * Collector} mechanism, see {@link org.apache.lucene.search.IndexSearcher IndexSearcher}). The * TopScoreDocCollector uses a {@link org.apache.lucene.util.PriorityQueue PriorityQueue} to collect * the top results for the search. * *

At last, we are actually going to score some documents. The score method takes in the * Collector (most likely the TopScoreDocCollector or TopFieldCollector) and does its business. Of * course, here is where things get involved. The {@link org.apache.lucene.search.Scorer Scorer} * that is returned by the {@link org.apache.lucene.search.Weight Weight} object depends on what * type of Query was submitted. In most real world applications with multiple query terms, the * {@link org.apache.lucene.search.Scorer Scorer} is going to be a BooleanScorer2 * created from {@link org.apache.lucene.search.BooleanWeight BooleanWeight} (see the section on custom queries for info on changing this). * *

Assuming a BooleanScorer2, we get a internal Scorer based on the required, optional and * prohibited parts of the query. Using this internal Scorer, the BooleanScorer2 then proceeds into * a while loop based on the {@link org.apache.lucene.search.DocIdSetIterator#nextDoc * DocIdSetIterator.nextDoc()} method. The nextDoc() method advances to the next document matching * the query. This is an abstract method in the Scorer class and is thus overridden by all derived * implementations. If you have a simple OR query your internal Scorer is most likely a * DisjunctionSumScorer, which essentially combines the scorers from the sub scorers of the OR'd * terms. */ package org.apache.lucene.search;





© 2015 - 2025 Weber Informatics LLC | Privacy Policy