All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.infinispan.CacheStream Maven / Gradle / Ivy

There is a newer version: 15.1.0.Dev04
Show newest version
package org.infinispan;

import java.io.Serializable;
import java.util.Comparator;
import java.util.Iterator;
import java.util.Optional;
import java.util.Set;
import java.util.Spliterator;
import java.util.concurrent.TimeUnit;
import java.util.function.BiConsumer;
import java.util.function.BiFunction;
import java.util.function.BinaryOperator;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.function.IntFunction;
import java.util.function.Predicate;
import java.util.function.Supplier;
import java.util.function.ToDoubleFunction;
import java.util.function.ToIntFunction;
import java.util.function.ToLongFunction;
import java.util.stream.Collector;
import java.util.stream.DoubleStream;
import java.util.stream.IntStream;
import java.util.stream.LongStream;
import java.util.stream.Stream;

import org.infinispan.commons.util.IntSet;
import org.infinispan.stream.CacheCollectors;
import org.infinispan.util.function.SerializableBiConsumer;
import org.infinispan.util.function.SerializableBiFunction;
import org.infinispan.util.function.SerializableBinaryOperator;
import org.infinispan.util.function.SerializableComparator;
import org.infinispan.util.function.SerializableConsumer;
import org.infinispan.util.function.SerializableFunction;
import org.infinispan.util.function.SerializableIntFunction;
import org.infinispan.util.function.SerializablePredicate;
import org.infinispan.util.function.SerializableSupplier;
import org.infinispan.util.function.SerializableToDoubleFunction;
import org.infinispan.util.function.SerializableToIntFunction;
import org.infinispan.util.function.SerializableToLongFunction;

import static org.infinispan.util.Casting.toSerialSupplierCollect;
import static org.infinispan.util.Casting.toSupplierCollect;

/**
 * A {@link Stream} that has additional operations to monitor or control behavior when used from a {@link Cache}.
 *
 * 

Whenever the iterator or spliterator methods are used the user must close the {@link Stream} * that the method was invoked on after completion of its operation. Failure to do so may cause a thread leakage if * the iterator or spliterator are not fully consumed.

* *

When using stream that is backed by a distributed cache these operations will be performed using remote * distribution controlled by the segments that each key maps to. All intermediate operations are lazy, even the * special cases described in later paragraphs and are not evaluated until a final terminal operation is invoked on * the stream. Essentially each set of intermediate operations is shipped to each remote node where they are applied * to a local stream there and finally the terminal operation is completed. If this stream is parallel the processing * on remote nodes is also done using a parallel stream.

* *

Parallel distribution is enabled by default for all operations except for {@link CacheStream#iterator()} and * {@link CacheStream#spliterator()}. Please see {@link CacheStream#sequentialDistribution()} and * {@link CacheStream#parallelDistribution()}. With this disabled only a single node will process the operation * at a time (includes locally).

* *

Rehash aware is enabled by default for all operations. Any intermediate or terminal operation may be invoked * multiple times during a rehash and thus you should ensure the are idempotent. This can be problematic for * {@link CacheStream#forEach(Consumer)} as it may be difficult to implement with such requirements, please see it for * more information. If you wish to disable rehash aware operations you can disable them by calling * {@link CacheStream#disableRehashAware()} which should provide better performance for some operations. The * performance is most affected for the key aware operations {@link CacheStream#iterator()}, * {@link CacheStream#spliterator()}, {@link CacheStream#forEach(Consumer)}. Disabling rehash can cause * incorrect results if the terminal operation is invoked and a rehash occurs before the operation completes. If * incorrect results do occur it is guaranteed that it will only be that entries were missed and no entries are * duplicated.

* *

Any stateful intermediate operation requires pulling all information up to that point local to operate properly. * Each of these methods may have slightly different behavior, so make sure you check the method you are utilizing.

* *

An example of such an operation is using distinct intermediate operation. What will happen * is upon calling the terminal operation a remote retrieval operation will be ran using all of * the intermediate operations up to the distinct operation remotely. This retrieval is then used to fuel a local * stream where all of the remaining intermediate operations are performed and then finally the terminal operation is * applied as normal. Note in this case the intermediate iterator still obeys the * {@link CacheStream#distributedBatchSize(int)} setting irrespective of the terminal operator.

* * @param The type of the stream * @since 8.0 */ public interface CacheStream extends Stream, BaseCacheStream> { /** * {@inheritDoc} * @return a stream with parallel distribution disabled. */ CacheStream sequentialDistribution(); /** * @inheritDoc * @return a stream with parallel distribution enabled. */ CacheStream parallelDistribution(); /** * {@inheritDoc} * @return a stream with the segments filtered. * @deprecated This is to be replaced by {@link #filterKeySegments(IntSet)} */ CacheStream filterKeySegments(Set segments); /** * {@inheritDoc} * @return a stream with the segments filtered. */ CacheStream filterKeySegments(IntSet segments); /** * {@inheritDoc} * @return a stream with the keys filtered. */ CacheStream filterKeys(Set keys); /** * {@inheritDoc} * @return a stream with the batch size updated */ CacheStream distributedBatchSize(int batchSize); /** * {@inheritDoc} * @return a stream with the listener registered. */ CacheStream segmentCompletionListener(SegmentCompletionListener listener); /** * {@inheritDoc} * @return a stream with rehash awareness disabled. */ CacheStream disableRehashAware(); /** * {@inheritDoc} * @return a stream with the timeout set */ CacheStream timeout(long timeout, TimeUnit unit); /** * {@inheritDoc} *

This operation is performed remotely on the node that is the primary owner for the key tied to the entry(s) * in this stream.

*

NOTE: This method while being rehash aware has the lowest consistency of all of the operators. This * operation will be performed on every entry at least once in the cluster, as long as the originator doesn't go * down while it is being performed. This is due to how the distributed action is performed. Essentially the * {@link CacheStream#distributedBatchSize} value controls how many elements are processed per node at a time * when rehash is enabled. After those are complete the keys are sent to the originator to confirm that those were * processed. If that node goes down during/before the response those keys will be processed a second time.

*

It is possible to have the cache local to each node injected into this instance if the provided * Consumer also implements the {@link org.infinispan.stream.CacheAware} interface. This method will be invoked * before the consumer accept() method is invoked.

*

This method is ran distributed by default with a distributed backing cache. However if you wish for this * operation to run locally you can use the {@code stream().iterator().forEachRemaining(action)} for a single * threaded variant. If you * wish to have a parallel variant you can use {@link java.util.stream.StreamSupport#stream(Spliterator, boolean)} * passing in the spliterator from the stream. In either case remember you must close the stream after * you are done processing the iterator or spliterator..

* @param action consumer to be ran for each element in the stream */ @Override void forEach(Consumer action); /** * Same as {@link CacheStream#forEach(Consumer)} except that the Consumer must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param action consumer to be ran for each element in the stream */ default void forEach(SerializableConsumer action) { forEach((Consumer) action); } /** * Same as {@link CacheStream#forEach(Consumer)} except that it takes a {@link BiConsumer} that provides access * to the underlying {@link Cache} that is backing this stream. *

* Note that the CacheAware interface is not supported for injection using this method as the cache * is provided in the consumer directly. * @param action consumer to be ran for each element in the stream * @param key type of the cache * @param value type of the cache */ void forEach(BiConsumer, ? super R> action); /** * Same as {@link CacheStream#forEach(BiConsumer)} except that the BiConsumer must also implement * Serializable * @param action consumer to be ran for each element in the stream * @param key type of the cache * @param value type of the cache */ default void forEach(SerializableBiConsumer, ? super R> action) { forEach((BiConsumer, ? super R>) action); } /** * {@inheritDoc} *

Usage of this operator requires closing this stream after you are done with the iterator. The preferred * usage is to use a try with resource block on the stream.

*

This method has special usage with the {@link org.infinispan.CacheStream.SegmentCompletionListener} in * that as entries are retrieved from the next method it will complete segments.

*

This method obeys the {@link CacheStream#distributedBatchSize(int)}. Note that when using methods such as * {@link CacheStream#flatMap(Function)} that you will have possibly more than 1 element mapped to a given key * so this doesn't guarantee that many number of entries are returned per batch.

*

Note that the {@link Iterator#remove()} method is only supported if no intermediate operations have been * applied to the stream and this is not a stream created from a {@link Cache#values()} collection.

* @return the element iterator for this stream */ @Override Iterator iterator(); /** * {@inheritDoc} *

Usage of this operator requires closing this stream after you are done with the spliterator. The preferred * usage is to use a try with resource block on the stream.

* @return the element spliterator for this stream */ @Override Spliterator spliterator(); /** * {@inheritDoc} *

This operation is performed entirely on the local node irrespective of the backing cache. This * operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. * Beware this means it will require having all entries of this cache into memory at one time. This is described in * more detail at {@link CacheStream}

*

Any subsequent intermediate operations and the terminal operation are also performed locally.

* @return the new stream */ @Override CacheStream sorted(); /** * {@inheritDoc} *

This operation is performed entirely on the local node irrespective of the backing cache. This * operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. * Beware this means it will require having all entries of this cache into memory at one time. This is described in * more detail at {@link CacheStream}

*

Any subsequent intermediate operations and the terminal operation are then performed locally.

* @param comparator the comparator to be used for sorting the elements * @return the new stream */ @Override CacheStream sorted(Comparator comparator); /** * Same as {@link CacheStream#sorted(Comparator)} except that the Comparator must * also implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param comparator a non-interfering, stateless * {@code Comparator} to be used to compare stream elements * @return the new stream */ default CacheStream sorted(SerializableComparator comparator) { return sorted((Comparator) comparator); } /** * {@inheritDoc} *

This intermediate operation will be performed both remotely and locally to reduce how many elements * are sent back from each node. More specifically this operation is applied remotely on each node to only return * up to the maxSize value and then the aggregated results are limited once again on the local node.

*

This operation will act as an intermediate iterator operation requiring data be brought locally for proper * behavior. This is described in more detail in the {@link CacheStream} documentation

*

Any subsequent intermediate operations and the terminal operation are then performed locally.

* @param maxSize how many elements to limit this stream to. * @return the new stream */ @Override CacheStream limit(long maxSize); /** * {@inheritDoc} *

This operation is performed entirely on the local node irrespective of the backing cache. This * operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. * This is described in more detail in the {@link CacheStream} documentation

*

Depending on the terminal operator this may or may not require all entries or a subset after skip is applied * to be in memory all at once.

*

Any subsequent intermediate operations and the terminal operation are then performed locally.

* @param n how many elements to skip from this stream * @return the new stream */ @Override CacheStream skip(long n); /** * {@inheritDoc} * @param action the action to perform on the stream * @return the new stream */ @Override CacheStream peek(Consumer action); /** * Same as {@link CacheStream#peek(Consumer)} except that the Consumer must also implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param action a non-interfering action to perform on the elements as * they are consumed from the stream * @return the new stream */ default CacheStream peek(SerializableConsumer action) { return peek((Consumer) action); } /** * {@inheritDoc} *

This operation will be invoked both remotely and locally when used with a distributed cache backing this stream. * This operation will act as an intermediate iterator operation requiring data be brought locally for proper * behavior. This is described in more detail in the {@link CacheStream} documentation

*

This intermediate iterator operation will be performed locally and remotely requiring possibly a subset of * all elements to be in memory

*

Any subsequent intermediate operations and the terminal operation are then performed locally.

* @return the new stream */ @Override CacheStream distinct(); /** * {@inheritDoc} *

Note when using a distributed backing cache for this stream the collector must be marshallable. This * prevents the usage of {@link java.util.stream.Collectors} class. However you can use the * {@link org.infinispan.stream.CacheCollectors} static factory methods to create a serializable wrapper, which then * creates the actual collector lazily after being deserialized. This is useful to use any method from the * {@link java.util.stream.Collectors} class as you would normally. * Alternatively, you can call {@link #collect(SerializableSupplier)} too.

* @param collector * @param collected type * @param intermediate collected type if applicable * @return the collected value * @see org.infinispan.stream.CacheCollectors */ @Override R1 collect(Collector collector); /** * Performs a mutable * reduction operation on the elements of this stream using a * {@code Collector} that is lazily created from the {@code SerializableSupplier} * provided. * * This method behaves exactly the same as {@link #collect(Collector)} with * the enhanced capability of working even when the mutable reduction * operation has to run in a remote node and the operation is not * {@link Serializable} or otherwise marshallable. * * So, this method is specially designed for situations when the user * wants to use a {@link Collector} instance that has been created by * {@link java.util.stream.Collectors} static factory methods. * * In this particular case, the function that instantiates the * {@link Collector} will be marshalled according to the * {@link Serializable} rules. * * @param supplier The supplier to create the collector that is specifically serializable * @param The resulting type of the collector * @return the collected value * @since 9.2 */ default R1 collect(SerializableSupplier> supplier) { return collect(CacheCollectors.serializableCollector(toSerialSupplierCollect(supplier))); } /** * Performs a mutable * reduction operation on the elements of this stream using a * {@code Collector} that is lazily created from the {@code Supplier} * provided. * * This method behaves exactly the same as {@link #collect(Collector)} with * the enhanced capability of working even when the mutable reduction * operation has to run in a remote node and the operation is not * {@link Serializable} or otherwise marshallable. * * So, this method is specially designed for situations when the user * wants to use a {@link Collector} instance that has been created by * {@link java.util.stream.Collectors} static factory methods. * * In this particular case, the function that instantiates the * {@link Collector} will be marshalled using Infinispan * {@link org.infinispan.commons.marshall.Externalizer} class or one of its * subtypes. * * @param supplier The supplier to create the collector * @param The resulting type of the collector * @return the collected value * @since 9.2 */ default R1 collect(Supplier> supplier) { return collect(CacheCollectors.collector(toSupplierCollect(supplier))); } /** * Same as {@link CacheStream#collect(Supplier, BiConsumer, BiConsumer)} except that the various arguments must * also implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param type of the result * @param supplier a function that creates a new result container. For a * parallel execution, this function may be called * multiple times and must return a fresh value each time. * Must be serializable * @param accumulator an associative, non-interfering, stateless * function for incorporating an additional element into a result and * must be serializable * @param combiner an associative, non-interfering, stateless * function for combining two values, which must be * compatible with the accumulator function and serializable * @return the result of the reduction */ default R1 collect(SerializableSupplier supplier, SerializableBiConsumer accumulator, SerializableBiConsumer combiner) { return collect((Supplier) supplier, accumulator, combiner); } /** * Same as {@link CacheStream#allMatch(Predicate)} except that the Predicate must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param predicate a non-interfering, stateless * predicate to apply to elements of this stream that is serializable * @return {@code true} if either all elements of the stream match the * provided predicate or the stream is empty, otherwise {@code false} */ default boolean allMatch(SerializablePredicate predicate) { return allMatch((Predicate) predicate); } /** * Same as {@link CacheStream#noneMatch(Predicate)} except that the Predicate must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param predicate a non-interfering, stateless * predicate to apply to elements of this stream that is serializable * @return {@code true} if either no elements of the stream match the * provided predicate or the stream is empty, otherwise {@code false} */ default boolean noneMatch(SerializablePredicate predicate) { return noneMatch((Predicate) predicate); } /** * Same as {@link CacheStream#anyMatch(Predicate)} except that the Predicate must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param predicate a non-interfering, stateless * predicate to apply to elements of this stream that is serializable * @return {@code true} if any elements of the stream match the provided * predicate, otherwise {@code false} */ default boolean anyMatch(SerializablePredicate predicate) { return anyMatch((Predicate) predicate); } /** * Same as {@link CacheStream#max(Comparator)} except that the Comparator must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param comparator a non-interfering, stateless * {@code Comparator} to compare elements of this stream that is also serializable * @return an {@code Optional} describing the maximum element of this stream, * or an empty {@code Optional} if the stream is empty */ default Optional max(SerializableComparator comparator) { return max((Comparator) comparator); } /** * Same as {@link CacheStream#min(Comparator)} except that the Comparator must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param comparator a non-interfering, stateless * {@code Comparator} to compare elements of this stream that is also serializable * @return an {@code Optional} describing the minimum element of this stream, * or an empty {@code Optional} if the stream is empty */ default Optional min(SerializableComparator comparator) { return min((Comparator) comparator); } /** * Same as {@link CacheStream#reduce(BinaryOperator)} except that the BinaryOperator must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param accumulator an associative, non-interfering, stateless * function for combining two values that is also serializable * @return an {@link Optional} describing the result of the reduction */ default Optional reduce(SerializableBinaryOperator accumulator) { return reduce((BinaryOperator) accumulator); } /** * Same as {@link CacheStream#reduce(Object, BinaryOperator)} except that the BinaryOperator must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param identity the identity value for the accumulating function * @param accumulator an associative, non-interfering, stateless * function for combining two values that is also serializable * @return the result of the reduction */ default R reduce(R identity, SerializableBinaryOperator accumulator) { return reduce(identity, (BinaryOperator) accumulator); } /** * Same as {@link CacheStream#reduce(Object, BiFunction, BinaryOperator)} except that the BinaryOperator must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param The type of the result * @param identity the identity value for the combiner function * @param accumulator an associative, non-interfering, stateless * function for incorporating an additional element into a result that is also serializable * @param combiner an associative, non-interfering, stateless * function for combining two values, which must be * compatible with the accumulator function that is also serializable * @return the result of the reduction */ default U reduce(U identity, SerializableBiFunction accumulator, SerializableBinaryOperator combiner) { return reduce(identity, (BiFunction) accumulator, combiner); } /** * Same as {@link CacheStream#toArray(IntFunction)} except that the BinaryOperator must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param the element type of the resulting array * @param generator a function which produces a new array of the desired * type and the provided length that is also serializable * @return an array containing the elements in this stream */ default A[] toArray(SerializableIntFunction generator) { return toArray((IntFunction) generator); } /** * {@inheritDoc} * @return the new cache stream */ @Override CacheStream filter(Predicate predicate); /** * Same as {@link CacheStream#filter(Predicate)} except that the Predicate must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param predicate a non-interfering, stateless * predicate to apply to each element to determine if it * should be included * @return the new cache stream */ default CacheStream filter(SerializablePredicate predicate) { return filter((Predicate) predicate); } /** * {@inheritDoc} * @return the new cache stream */ @Override CacheStream map(Function mapper); /** * Same as {@link CacheStream#map(Function)} except that the Function must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param The element type of the new stream * @param mapper a non-interfering, stateless * function to apply to each element * @return the new cache stream */ default CacheStream map(SerializableFunction mapper) { return map((Function) mapper); } /** * {@inheritDoc} * @param mapper a non-interfering, stateless * function to apply to each element * @return the new double cache stream */ @Override DoubleCacheStream mapToDouble(ToDoubleFunction mapper); /** * Same as {@link CacheStream#mapToDouble(ToDoubleFunction)} except that the ToDoubleFunction must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param mapper a non-interfering, stateless * function to apply to each element * @return the new stream */ default DoubleCacheStream mapToDouble(SerializableToDoubleFunction mapper) { return mapToDouble((ToDoubleFunction) mapper); } /** * {@inheritDoc} * @param mapper a non-interfering, stateless * function to apply to each element * @return the new int cache stream */ @Override IntCacheStream mapToInt(ToIntFunction mapper); /** * Same as {@link CacheStream#mapToInt(ToIntFunction)} except that the ToIntFunction must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param mapper a non-interfering, stateless * function to apply to each element * @return the new stream */ default IntCacheStream mapToInt(SerializableToIntFunction mapper) { return mapToInt((ToIntFunction) mapper); } /** * {@inheritDoc} * @param mapper a non-interfering, stateless * function to apply to each element * @return the new long cache stream */ @Override LongCacheStream mapToLong(ToLongFunction mapper); /** * Same as {@link CacheStream#mapToLong(ToLongFunction)} except that the ToLongFunction must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param mapper a non-interfering, stateless * function to apply to each element * @return the new stream */ default LongCacheStream mapToLong(SerializableToLongFunction mapper) { return mapToLong((ToLongFunction) mapper); } /** * {@inheritDoc} * @return the new cache stream */ @Override CacheStream flatMap(Function> mapper); /** * Same as {@link CacheStream#flatMap(Function)} except that the Function must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param The element type of the new stream * @param mapper a non-interfering, stateless * function to apply to each element which produces a stream * of new values * @return the new cache stream */ default CacheStream flatMap(SerializableFunction> mapper) { return flatMap((Function>) mapper); } /** * {@inheritDoc} * @return the new cache stream */ @Override DoubleCacheStream flatMapToDouble(Function mapper); /** * Same as {@link CacheStream#flatMapToDouble(Function)} except that the Function must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param mapper a non-interfering, stateless * function to apply to each element which produces a stream * of new values * @return the new stream */ default DoubleCacheStream flatMapToDouble(SerializableFunction mapper) { return flatMapToDouble((Function) mapper); } /** * {@inheritDoc} * @return the new cache stream */ @Override IntCacheStream flatMapToInt(Function mapper); /** * Same as {@link CacheStream#flatMapToInt(Function)} except that the Function must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param mapper a non-interfering, stateless * function to apply to each element which produces a stream * of new values * @return the new stream */ default IntCacheStream flatMapToInt(SerializableFunction mapper) { return flatMapToInt((Function) mapper); } /** * {@inheritDoc} * @return the new cache stream */ @Override LongCacheStream flatMapToLong(Function mapper); /** * Same as {@link CacheStream#flatMapToLong(Function)} except that the Function must also * implement Serializable *

* The compiler will pick this overload for lambda parameters, making them Serializable * @param mapper a non-interfering, stateless * function to apply to each element which produces a stream * of new values * @return the new stream */ default LongCacheStream flatMapToLong(SerializableFunction mapper) { return flatMapToLong((Function) mapper); } /** * {@inheritDoc} * @return a parallel cache stream */ @Override CacheStream parallel(); /** * {@inheritDoc} * @return a sequential cache stream */ @Override CacheStream sequential(); /** * {@inheritDoc} * @return an unordered cache stream */ @Override CacheStream unordered(); /** * {@inheritDoc} * @param closeHandler * @return a cache stream with the handler applied */ @Override CacheStream onClose(Runnable closeHandler); }