org.apache.lucene.monitor.package-info Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of lucene-monitor Show documentation
Apache Lucene (module: monitor)
There is a newer version: 10.0.0
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 *
 *
 * Monitoring framework
 *
 * This package contains classes to allow the monitoring of a stream of documents with a set of
 * queries.
 *
 * To use, instantiate a {@link org.apache.lucene.monitor.Monitor} object, register queries with
 * it via {@link
 * org.apache.lucene.monitor.Monitor#register(org.apache.lucene.monitor.MonitorQuery...)}, and then
 * match documents against it either individually via {@link
 * org.apache.lucene.monitor.Monitor#match(org.apache.lucene.document.Document,
 * org.apache.lucene.monitor.MatcherFactory)} or in batches via {@link
 * org.apache.lucene.monitor.Monitor#match(org.apache.lucene.document.Document[],
 * org.apache.lucene.monitor.MatcherFactory)}
 *
 * 
Matcher types
 *
 * A number of matcher types are included:
 *
 * 
 *   {@link org.apache.lucene.monitor.QueryMatch#SIMPLE_MATCHER} — just returns the set of
 *       query ids that a Document has matched
 *   
{@link
 *       org.apache.lucene.monitor.ScoringMatch#matchWithSimilarity(org.apache.lucene.search.similarities.Similarity)}
 *       — returns the set of matching queries, with the score that each one records against a
 *       Document
 *   
{@link org.apache.lucene.monitor.ExplainingMatch#MATCHER — similar to ScoringMatch,
 *       but include the full Explanation}
 *   
{@link org.apache.lucene.monitor.HighlightsMatch#MATCHER — return the matching
 *       queries along with the matching terms for each query}
 * 
 *
 * Matchers can be wrapped in {@link org.apache.lucene.monitor.PartitionMatcher} or {@link
 * org.apache.lucene.monitor.ParallelMatcher} to increase performance in low-concurrency systems.
 *
 * Pre-filtering of queries
 *
 * Monitoring is done efficiently by extracting minimal sets of terms from queries, and using these
 * to build a query index. When a document is passed to {@link
 * org.apache.lucene.monitor.Monitor#match(org.apache.lucene.document.Document,
 * org.apache.lucene.monitor.MatcherFactory)}, it is converted into a small index, and the terms
 * dictionary from that index is then used to build a disjunction query to run against the query
 * index. Queries that match this disjunction are then run against the document. In this way, the
 * Monitor can avoid running queries that have no chance of matching. The process of extracting
 * terms and building document disjunctions is handled by a {@link
 * org.apache.lucene.monitor.Presearcher}
 *
 * In addition, extra per-field filtering can be specified by passing a set of keyword fields to
 * filter on. When queries are registered with the monitor, field-value pairs can be added as
 * optional metadata for each query, and these can then be used to restrict which queries a document
 * is checked against. For example, you can specify a language that each query should apply to, and
 * documents containing a value in their language field would only be checked against queries that
 * have that same value in their language metadata. Note that when matching documents in batches,
 * all documents in the batch must have the same values in their filter fields.
 *
 * 
Query analysis uses the {@link org.apache.lucene.search.QueryVisitor} API to extract terms,
 * which will work for all basic term-based queries shipped with Lucene. The analyzer builds a
 * representation of the query called a {@link org.apache.lucene.monitor.QueryTree}, and then
 * selects a minimal set of terms, one of which must be present in a document for that document to
 * match. Individual terms are weighted using a {@link org.apache.lucene.monitor.TermWeightor},
 * which allows some selectivity when building the term set. For example, given a conjunction of
 * terms (a boolean query with several MUST clauses, or a phrase, span or interval query), we need
 * only extract one term. The TermWeightor can be configured in a number of ways; by default it will
 * weight longer terms more highly.
 *
 * 
For query sets that contain many conjunctions, it can be useful to extract and index different
 * minimal term combinations. For example, a phrase query on 'the quick brown fox' could index both
 * 'quick' and 'brown', and avoid being run against documents that contain only one of these terms.
 * The {@link org.apache.lucene.monitor.MultipassTermFilteredPresearcher} allows this sort of
 * indexing, taking a minimum term weight so that very common terms such as 'the' can be avoided.
 *
 * 
Custom Query implementations that are based on term matching, and that implement {@link
 * org.apache.lucene.search.Query#visit(org.apache.lucene.search.QueryVisitor)} will work with no
 * extra configuration; for more complicated custom queries, you can register a {@link
 * org.apache.lucene.monitor.CustomQueryHandler} with the presearcher. Included in this package is a
 * {@link org.apache.lucene.monitor.RegexpQueryHandler}, which gives an example of a different
 * method of indexing automaton-based queries by extracting fixed substrings from a regular
 * expression, and then using ngram filtering to build the document disjunction.
 *
 * 
Persistent query sets
 *
 * By default, {@link org.apache.lucene.monitor.Monitor} instances are ephemeral, storing their
 * query indexes in memory. To make a persistent monitor, build a {@link
 * org.apache.lucene.monitor.MonitorConfiguration} object and call {@link
 * org.apache.lucene.monitor.MonitorConfiguration#setIndexPath(java.nio.file.Path,
 * org.apache.lucene.monitor.MonitorQuerySerializer)} to tell the Monitor to store its query index
 * on disk. All queries registered with this Monitor will need to have a string representation that
 * is also stored, and can be re-parsed by the associated {@link
 * org.apache.lucene.monitor.MonitorQuerySerializer} when the index is loaded by a new Monitor
 * instance.
 */
package org.apache.lucene.monitor;