org.elasticsearch.search.aggregations.package-info Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of elasticsearch Show documentation
Elasticsearch subproject :server
There is a newer version: 8.14.0
/*
 * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
 * or more contributor license agreements. Licensed under the Elastic License
 * 2.0 and the Server Side Public License, v 1; you may not use this file except
 * in compliance with, at your election, the Elastic License 2.0 or the Server
 * Side Public License, v 1.
 */

/**
 * Aggregations
 * Builds analytic information over all hits in a search request.  Aggregations
 * are essentially a tool for sumarizing data, and that summary is often used
 * to generate a visualization.
 *
 * Types of aggregations
 * There are three main types of aggregations, each in their own sub package:
 * 
 *     Bucket aggregations - which group documents (e.g. a histogram)
 *     Metric aggregations - which compute a summary value from several
 *     documents (e.g. a sum)
 *     Pipeline aggregations - which run as a seperate step and compute
 *     values across buckets
 * 
 * Additionally there is a support sub package, which contains the type checking
 * and resolution logic, primarily.
 *
 * How Aggregations Work
 * TODO: Info about search phases goes here
 *
 * Aggregations operate in general as Map Reduce jobs.  The coordinating node for
 * the query dispatches the aggregation to each data node.  The data nodes all
 * instantiate an {@link org.elasticsearch.search.aggregations.AggregationBuilder}
 * of the appropriate type, which in turn builds the
 * {@link org.elasticsearch.search.aggregations.Aggregator} for that node.  This
 * collects the data from that shard, via
 * {@link org.elasticsearch.search.aggregations.Aggregator#getLeafCollector(org.apache.lucene.index.LeafReaderContext)}
 * more or less.  These values are shipped back to the coordinating node, which
 * performs the reduction on them (partial reductions in place on the data nodes
 * are also possible).
 *
 * Three modes of operation
 * When it comes to actually collecting values, there are three ways aggregations
 * operate, in general.  Which one we choose depends on limitations in the query
 * and how the data was ingested (e.g. if it is searchable).
 *
 * The easiest to understand is the Compatible (i.e. usable in
 * all situations) mode, which can be thought of as iterating each query hit and
 * collecting a value from it.  This is the least performant way to evaluate
 * aggregations, requiring looking at every hit.
 *
 * The fastest way to run an aggregation is by looking at the index structures
 * directly.  For example, Lucene just stores the minimum and maximum values
 * of fields per segment, so a min aggregation matching all documents in a segment
 * can just look up its result.  Generally speaking, this mode can be engaged when
 * there are no queries or sub-aggregations, and is gated by
 * {@link org.elasticsearch.search.aggregations.support.ValuesSourceConfig#getPointReaderOrNull()}.
 *
 * Finally, we can rewrite an aggregation into faster aggregations,
 * or ideally into just a query.  Generally, the goal here is to get to
 * filter by filters (which is an optimization on the filters aggregation
 * which runs it as a set of filter queries).  Often this process will look like rewriting
 * a DateHistogram into a DateRange, and then rewriting the DateRange into Filters.
 * If you see {@link org.elasticsearch.search.aggregations.AdaptingAggregator}, that's
 * a good clue that the rewrite mode is being used.  In general, when we rewrite aggregations,
 * we are able to detect if the rewritten agg can run in a "fast" mode, and decline the
 * rewrite if it can't.
 *
 * In general, aggs will try to use one of the fast modes, and if that's not possible,
 * fall back to running in compatible mode.
 */
package org.elasticsearch.search.aggregations;