org.elasticsearch.search.aggregations.package-info Maven / Gradle / Ivy
Show all versions of elasticsearch Show documentation
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
/**
* Aggregations
* Builds analytic information over all hits in a search request. Aggregations
* are essentially a tool for sumarizing data, and that summary is often used
* to generate a visualization.
*
* Types of aggregations
* There are three main types of aggregations, each in their own sub package:
*
* - Bucket aggregations - which group documents (e.g. a histogram)
* - Metric aggregations - which compute a summary value from several
* documents (e.g. a sum)
* - Pipeline aggregations - which run as a seperate step and compute
* values across buckets
*
* Additionally there is a support sub package, which contains the type checking
* and resolution logic, primarily.
*
* How Aggregations Work
* TODO: Info about search phases goes here
*
* Aggregations operate in general as Map Reduce jobs. The coordinating node for
* the query dispatches the aggregation to each data node. The data nodes all
* instantiate an {@link org.elasticsearch.search.aggregations.AggregationBuilder}
* of the appropriate type, which in turn builds the
* {@link org.elasticsearch.search.aggregations.Aggregator} for that node. This
* collects the data from that shard, via
* {@link org.elasticsearch.search.aggregations.Aggregator#getLeafCollector(AggregationExecutionContext)}
* more or less. These values are shipped back to the coordinating node, which
* performs the reduction on them (partial reductions in place on the data nodes
* are also possible).
*
* Three modes of operation
* When it comes to actually collecting values, there are three ways aggregations
* operate, in general. Which one we choose depends on limitations in the query
* and how the data was ingested (e.g. if it is searchable).
*
* The easiest to understand is the Compatible (i.e. usable in
* all situations) mode, which can be thought of as iterating each query hit and
* collecting a value from it. This is the least performant way to evaluate
* aggregations, requiring looking at every hit.
*
* The fastest way to run an aggregation is by looking at the index structures
* directly. For example, Lucene just stores the minimum and maximum values
* of fields per segment, so a min aggregation matching all documents in a segment
* can just look up its result. Generally speaking, this mode can be engaged when
* there are no queries or sub-aggregations, and is gated by
* {@link org.elasticsearch.search.aggregations.support.ValuesSourceConfig#getPointReaderOrNull()}.
*
* Finally, we can rewrite an aggregation into faster aggregations,
* or ideally into just a query. Generally, the goal here is to get to
* filter by filters (which is an optimization on the filters aggregation
* which runs it as a set of filter queries). Often this process will look like rewriting
* a DateHistogram into a DateRange, and then rewriting the DateRange into Filters.
* If you see {@link org.elasticsearch.search.aggregations.AdaptingAggregator}, that's
* a good clue that the rewrite mode is being used. In general, when we rewrite aggregations,
* we are able to detect if the rewritten agg can run in a "fast" mode, and decline the
* rewrite if it can't.
*
* In general, aggs will try to use one of the fast modes, and if that's not possible,
* fall back to running in compatible mode.
*/
package org.elasticsearch.search.aggregations;