java8.util.stream.Collector Maven / Gradle / Ivy
Show all versions of streamsupport Show documentation
/*
* Copyright (c) 2012, 2013, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation. Oracle designates this
* particular file as subject to the "Classpath" exception as provided
* by Oracle in the LICENSE file that accompanied this code.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*/
package java8.util.stream;
import java.util.Set;
import java8.util.function.BiConsumer;
import java8.util.function.BinaryOperator;
import java8.util.function.Function;
import java8.util.function.Supplier;
/**
* A mutable reduction operation that
* accumulates input elements into a mutable result container, optionally transforming
* the accumulated result into a final representation after all input elements
* have been processed. Reduction operations can be performed either sequentially
* or in parallel.
*
* Examples of mutable reduction operations include:
* accumulating elements into a {@code Collection}; concatenating
* strings using a {@code StringBuilder}; computing summary information about
* elements such as sum, min, max, or average; computing "pivot table" summaries
* such as "maximum valued transaction by seller", etc. The class {@link Collectors}
* provides implementations of many common mutable reductions.
*
*
A {@code Collector} is specified by four functions that work together to
* accumulate entries into a mutable result container, and optionally perform
* a final transform on the result. They are:
* - creation of a new result container ({@link #supplier()})
* - incorporating a new data element into a result container ({@link #accumulator()})
* - combining two result containers into one ({@link #combiner()})
* - performing an optional final transform on the container ({@link #finisher()})
*
*
* Collectors also have a set of characteristics, such as
* {@link Characteristics#CONCURRENT}, that provide hints that can be used by a
* reduction implementation to provide better performance.
*
*
A sequential implementation of a reduction using a collector would
* create a single result container using the supplier function, and invoke the
* accumulator function once for each input element. A parallel implementation
* would partition the input, create a result container for each partition,
* accumulate the contents of each partition into a subresult for that partition,
* and then use the combiner function to merge the subresults into a combined
* result.
*
*
To ensure that sequential and parallel executions produce equivalent
* results, the collector functions must satisfy an identity and an
* associativity constraints.
*
*
The identity constraint says that for any partially accumulated result,
* combining it with an empty result container must produce an equivalent
* result. That is, for a partially accumulated result {@code a} that is the
* result of any series of accumulator and combiner invocations, {@code a} must
* be equivalent to {@code combiner.apply(a, supplier.get())}.
*
*
The associativity constraint says that splitting the computation must
* produce an equivalent result. That is, for any input elements {@code t1}
* and {@code t2}, the results {@code r1} and {@code r2} in the computation
* below must be equivalent:
*
{@code
* A a1 = supplier.get();
* accumulator.accept(a1, t1);
* accumulator.accept(a1, t2);
* R r1 = finisher.apply(a1); // result without splitting
*
* A a2 = supplier.get();
* accumulator.accept(a2, t1);
* A a3 = supplier.get();
* accumulator.accept(a3, t2);
* R r2 = finisher.apply(combiner.apply(a2, a3)); // result with splitting
* }
*
* For collectors that do not have the {@code UNORDERED} characteristic,
* two accumulated results {@code a1} and {@code a2} are equivalent if
* {@code finisher.apply(a1).equals(finisher.apply(a2))}. For unordered
* collectors, equivalence is relaxed to allow for non-equality related to
* differences in order. (For example, an unordered collector that accumulated
* elements to a {@code List} would consider two lists equivalent if they
* contained the same elements, ignoring order.)
*
*
Libraries that implement reduction based on {@code Collector}, such as
* {@link Stream#collect(Collector)}, must adhere to the following constraints:
*
* - The first argument passed to the accumulator function, both
* arguments passed to the combiner function, and the argument passed to the
* finisher function must be the result of a previous invocation of the
* result supplier, accumulator, or combiner functions.
* - The implementation should not do anything with the result of any of
* the result supplier, accumulator, or combiner functions other than to
* pass them again to the accumulator, combiner, or finisher functions,
* or return them to the caller of the reduction operation.
* - If a result is passed to the combiner or finisher
* function, and the same object is not returned from that function, it is
* never used again.
* - Once a result is passed to the combiner or finisher function, it
* is never passed to the accumulator function again.
* - For non-concurrent collectors, any result returned from the result
* supplier, accumulator, or combiner functions must be serially
* thread-confined. This enables collection to occur in parallel without
* the {@code Collector} needing to implement any additional synchronization.
* The reduction implementation must manage that the input is properly
* partitioned, that partitions are processed in isolation, and combining
* happens only after accumulation is complete.
* - For concurrent collectors, an implementation is free to (but not
* required to) implement reduction concurrently. A concurrent reduction
* is one where the accumulator function is called concurrently from
* multiple threads, using the same concurrently-modifiable result container,
* rather than keeping the result isolated during accumulation.
* A concurrent reduction should only be applied if the collector has the
* {@link Characteristics#UNORDERED} characteristics or if the
* originating data is unordered.
*
*
* In addition to the predefined implementations in {@link Collectors}, the
* static factory methods
* {@link java8.util.stream.Collectors#of(Supplier, BiConsumer, BinaryOperator, Collector.Characteristics...)}
* can be used to construct collectors. For example, you could create a collector
* that accumulates widgets into a {@code TreeSet} with:
*
*
{@code
* Collector> intoSet =
* Collector.of(TreeSet::new, TreeSet::add,
* (left, right) -> { left.addAll(right); return left; });
* }
*
* (This behavior is also implemented by the predefined collector
* {@link Collectors#toCollection(Supplier)}).
*
* API Note:
* Performing a reduction operation with a {@code Collector} should produce a
* result equivalent to:
*
{@code
* R container = collector.supplier().get();
* for (T t : data)
* collector.accumulator().accept(container, t);
* return collector.finisher().apply(container);
* }
*
* However, the library is free to partition the input, perform the reduction
* on the partitions, and then use the combiner function to combine the partial
* results to achieve a parallel reduction. (Depending on the specific reduction
* operation, this may perform better or worse, depending on the relative cost
* of the accumulator and combiner functions.)
*
*
Collectors are designed to be composed; many of the methods
* in {@link Collectors} are functions that take a collector and produce
* a new collector. For example, given the following collector that computes
* the sum of the salaries of a stream of employees:
*
*
{@code
* Collector summingSalaries
* = Collectors.summingInt(Employee::getSalary))
* }
*
* If we wanted to create a collector to tabulate the sum of salaries by
* department, we could reuse the "sum of salaries" logic using
* {@link Collectors#groupingBy(Function, Collector)}:
*
* {@code
* Collector> summingSalariesByDept
* = Collectors.groupingBy(Employee::getDepartment, summingSalaries);
* }
*
* @see Stream#collect(Collector)
* @see Collectors
*
* @param the type of input elements to the reduction operation
* @param the mutable accumulation type of the reduction operation (often
* hidden as an implementation detail)
* @param the result type of the reduction operation
* @since 1.8
*/
public interface Collector {
/**
* A function that creates and returns a new mutable result container.
*
* @return a function which returns a new, mutable result container
*/
Supplier supplier();
/**
* A function that folds a value into a mutable result container.
*
* @return a function which folds a value into a mutable result container
*/
BiConsumer accumulator();
/**
* A function that accepts two partial results and merges them. The
* combiner function may fold state from one argument into the other and
* return that, or may return a new result container.
*
* @return a function which combines two partial results into a combined
* result
*/
BinaryOperator combiner();
/**
* Perform the final transformation from the intermediate accumulation type
* {@code A} to the final result type {@code R}.
*
* If the characteristic {@code IDENTITY_FINISH} is
* set, this function may be presumed to be an identity transform with an
* unchecked cast from {@code A} to {@code R}.
*
* @return a function which transforms the intermediate result to the final
* result
*/
Function finisher();
/**
* Returns a {@code Set} of {@code Collector.Characteristics} indicating
* the characteristics of this Collector. This set should be immutable.
*
* @return an immutable set of collector characteristics
*/
Set characteristics();
/**
* Characteristics indicating properties of a {@code Collector}, which can
* be used to optimize reduction implementations.
*/
enum Characteristics {
/**
* Indicates that this collector is concurrent, meaning that
* the result container can support the accumulator function being
* called concurrently with the same result container from multiple
* threads.
*
* If a {@code CONCURRENT} collector is not also {@code UNORDERED},
* then it should only be evaluated concurrently if applied to an
* unordered data source.
*/
CONCURRENT,
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*/
UNORDERED,
/**
* Indicates that the finisher function is the identity function and
* can be elided. If set, it must be the case that an unchecked cast
* from A to R will succeed.
*/
IDENTITY_FINISH
}
}