src.overview.html Maven / Gradle / Ivy
Show all versions of fastutil-core Show documentation
fastutil
Extends the the Java™ Collections Framework
by providing type-specific maps, sets, lists and priority queues with a small memory
footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and
fast, practical I/O classes for binary and text files. It is
free software
distributed under the Apache License 2.0.
Package Specification
fastutil
is formed by three cores:
- type-specific classes that extend naturally
the Java™ Collections Framework;
- classes that support very large collections;
- classes for fast and practical access to binary and text files.
The three cores are briefly introduced in the next sections, and then discussed at length in the rest of this overview.
The following features were added in version 8.5.0:
- type-specific {@linkplain java.util.Spliterator spliterators};
- primitive {@linkplain java.util.stream.Stream stream} methods;
- even more default methods overridden for performance or avoiding boxing/unboxing.
We strongly suggest to activate deprecation warnings in your development environment, as fastutil
systematically deprecates all JDK methods for which there is a type-specific alternative.
Type-specific classes
fastutil
specializes the most useful {@link
java.util.HashSet}, {@link java.util.HashMap}, {@link
java.util.LinkedHashSet}, {@link java.util.LinkedHashMap}, {@link
java.util.TreeSet}, {@link java.util.TreeMap}, {@link
java.util.IdentityHashMap}, {@link java.util.ArrayList} and {@link
java.util.Stack} classes to versions that accept a specific kind of
key or value (e.g., {@linkplain it.unimi.dsi.fastutil.ints.IntSet integers}). Besides, there are also
several types of {@linkplain it.unimi.dsi.fastutil.PriorityQueue priority
queues} and a large collection of static objects and
methods (such as {@linkplain it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET
immutable empty containers}, {@linkplain
it.unimi.dsi.fastutil.ints.IntComparators#OPPOSITE_COMPARATOR
comparators implementing the opposite of the natural order},
{@linkplain it.unimi.dsi.fastutil.ints.IntIterators#wrap(int[])
iterators obtained by wrapping an array} and so on).
To understand what's going on at a glance, the best thing is to look at
the examples provided below. If you already used the
Collections Framework, everything should look
rather natural. If, in particular, you use an IDE such as Eclipse, which can suggest you the
method names, all you need to know is the right name for
the class you need.
Support for very large collections
A set of classes makes it possible
to handle very large collections: in particular, collections whose size exceeds
231. {@linkplain it.unimi.dsi.fastutil.BigArrays Big arrays}
are arrays-of-arrays handled by a wealth of static methods that act on them
as if they were monodimensional arrays with 64-bit indices;
{@linkplain it.unimi.dsi.fastutil.BigList big lists} provide 64-bit list access;
{@linkplain it.unimi.dsi.fastutil.ints.IntOpenHashBigSet big hash sets} provide support for sets whose
size is only limited by the amount of core memory.
The usual methods from {@link java.util.Arrays} and similar classes have
been extended to big arrays: have a look at the Javadoc documentation of
{@link it.unimi.dsi.fastutil.BigArrays} and {@link it.unimi.dsi.fastutil.ints.IntBigArrays}
to get an idea of the generic and type-specific methods available.
Limited support is available for big atomic arrays of integers and longs.
Fast and practical I/O
fastutil
provides replacements for some standard classes of {@link java.io}
that are plagued by a number of problems (see, e.g., {@link it.unimi.dsi.fastutil.io.FastBufferedInputStream}).
The {@link it.unimi.dsi.fastutil.io.BinIO} and {@link it.unimi.dsi.fastutil.io.TextIO} static
containers contain dozens of methods that make it possible to load and save quickly
(big) arrays to disks, to adapt binary and text file to iterators, and so on.
More on type-specific classes
All data structures in fastutil
extend their standard
counterpart interface whenever possible (e.g., {@link java.util.Map} for maps). Thus, they
can be just plugged into existing code, using the standard access methods
(of course, any attempt to use the wrong type for keys or values will
produce a {@link java.lang.ClassCastException}). However, they also provide
(whenever possible) many polymorphic versions of the most used methods that
avoid boxing/unboxing. In doing so, they specify more stringent interfaces that
extend and strengthen the standard ones (e.g., {@link
it.unimi.dsi.fastutil.ints.Int2IntSortedMap} or {@link
it.unimi.dsi.fastutil.ints.IntListIterator}). All generic methods
in type-specific interfaces have been deprecated.
Warning: automatic boxing and unboxing can lead you
to choose the wrong method when using fastutil
. It is also extremely inefficient.
We suggest that your programming environment is set so to mark boxing/unboxing as
a warning, or even better, as an error.
Of course, the main point of type-specific data structures is that the
absence of wrappers around primitive types can increase speed and reduce
space occupancy by several times. The presence of generics in Java
does not change this fact, since there is no genericity for primitive
types.
The implementation techniques used in fastutil
are quite
different than those of {@link java.util}: for instance, open-addressing
hash tables, threaded AVL trees, threaded red-black trees and exclusive-or
lists. An effort has also been made to provide powerful derived objects and
to expose them by overriding covariantly return types:
for instance, the {@linkplain it.unimi.dsi.fastutil.objects.Object2IntSortedMap#keySet() keys of sorted maps
are sorted} and iterators on sorted containers are always {@linkplain
it.unimi.dsi.fastutil.BidirectionalIterator bidirectional}.
More generally, the rationale behing fastutil
is that
you should never need to code explicitly natural
transformations. You do to not need to define an anonymous class to
iterate over an array of integers—just {@linkplain
it.unimi.dsi.fastutil.ints.IntIterators#wrap(int[]) wrap it}. You do not
need to write a loop to put the characters returned by an iterator into a
set—just {@linkplain
it.unimi.dsi.fastutil.chars.CharOpenHashSet#CharOpenHashSet(CharIterator)
use the right constructor}. And so on.
The Names
In general, class names adhere to the general pattern
valuetype collectiontype
for collections, and
keytype 2 valuetype maptype
for maps.
The word "type" here referes to a capitalized primitive type, {@link
java.lang.Object} or Reference
. In the latter case, we
are treating objects, but their equality is established by reference
equality (that is, without invoking equals()
), similarly
to {@link java.util.IdentityHashMap}. Of course, reference-based
classes are significantly faster.
Thus, an {@link it.unimi.dsi.fastutil.ints.IntOpenHashSet} stores
integers efficiently and implements {@link
it.unimi.dsi.fastutil.ints.IntSet}, whereas a {@link
it.unimi.dsi.fastutil.longs.Long2IntAVLTreeMap} does the same for maps from
longs to integers (but the map will be sorted, tree based, and balanced
using the AVL criterion), implementing {@link
it.unimi.dsi.fastutil.longs.Long2IntMap}. If you need additional
flexibility in choosing your {@linkplain
it.unimi.dsi.fastutil.Hash.Strategy hash strategy}, you can put, say, arrays
of integers in a {@link it.unimi.dsi.fastutil.objects.ObjectOpenCustomHashSet},
maybe using the ready-made {@linkplain
it.unimi.dsi.fastutil.ints.IntArrays#HASH_STRATEGY hash strategy for
arrays}. A {@link it.unimi.dsi.fastutil.longs.LongLinkedOpenHashSet}
stores longs in a hash table, but provides a predictable iteration order
(the insertion order) and access to first/last elements of the order. A
{@link it.unimi.dsi.fastutil.objects.Reference2ReferenceOpenHashMap} is
similar to an {@link java.util.IdentityHashMap}. You can manage a priority
queue of characters in a heap using a {@link
it.unimi.dsi.fastutil.chars.CharHeapPriorityQueue}, which implements {@link
it.unimi.dsi.fastutil.chars.CharPriorityQueue}. {@linkplain
it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList Front-coded lists} are
highly specialized immutable data structures that store compactly a large
number of arrays: if you don't know them you probably don't need them.
For a number of data structures that were not available in the
Java™ Collections Framework
when fastutil
was created, an object-based version is
contained in {@link it.unimi.dsi.fastutil}, and in that case the prefix
Object
is not used (see, e.g., {@link it.unimi.dsi.fastutil.PriorityQueue}).
Since there are eight primitive types in Java, and we support
reference-based containers, we get around 2000 (!) classes (some nonsensical
classes, such as Boolean2BooleanAVLTreeMap
, are not
generated). Many classes are generated just to mimic the hierarchy of
{@link java.util} so to redistribute common code in a similar way. There
are also several abstract classes that ease significantly the creation of
new type-specific classes by providing automatically generic methods based
on the type-specific ones.
The huge number of classes required a suitable division in subpackages.
Each subpackage is characterized by the type of elements
or keys: thus, for instance, {@link it.unimi.dsi.fastutil.ints.IntSet}
belongs to {@link it.unimi.dsi.fastutil.ints} (the plural is required, as
int
is a keyword and cannot be used in a package name), as
well as {@link it.unimi.dsi.fastutil.ints.Int2ReferenceRBTreeMap}. Note
that all classes for non-primitive elements and keys are gathered in {@link
it.unimi.dsi.fastutil.objects}. Finally, a number of non-type-specific
classes have been gathered in {@link it.unimi.dsi.fastutil}.
An In–Depth Look
The following table summarizes the available interfaces and
implementations. To get more information, you can look at a specific
implementation in {@link
it.unimi.dsi.fastutil} or, for instance, {@link it.unimi.dsi.fastutil.ints}. Note that a few existing
abstract classes have been deprecated and are not listed here (they are no longer necessary due to
default methods in the corresponding interfaces).
Interfaces Abstract Implementations Implementations
Iterable
Collection AbstractCollection
Set AbstractSet OpenHashSet, OpenCustomHashSet, ArraySet, OpenHashBigSet
SortedSet AbstractSortedSet RBTreeSet, AVLTreeSet, LinkedOpenHashSet
Function AbstractFunction
Map AbstractMap OpenHashMap, OpenCustomHashMap, ArrayMap
SortedMap AbstractSortedMap RBTreeMap, AVLTreeMap, LinkedOpenHashMap
List, BigList† AbstractList, AbstractBigList ArrayList, BigArrayBigList, ArrayFrontCodedList, ImmutableList
PriorityQueue† HeapPriorityQueue, ArrayPriorityQueue, ArrayFIFOQueue
IndirectPriorityQueue† HeapSemiIndirectPriorityQueue, HeapIndirectPriorityQueue, ArrayIndirectPriorityQueue
Stack† ArrayList
Pair† MutablePair, ImmutablePair
Iterator, BigListIterator† Iterators.AbstractIndexBasedIterator
Spliterator Spliterators.AbstractIndexBasedSpliterator
Comparator
BidirectionalIterator†
ListIterator
Consumer
Predicate
BinaryOperator
Size64‡
†: this class has also a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
‡: this class has only a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
Note that abstract implementations are named by prefixing the interface
name with Abstract. Thus, if you want to define a
type-specific structure holding a set of integers without the hassle of
defining object-based methods, you should inherit from {@link it.unimi.dsi.fastutil.ints.AbstractIntSet}.
The following table summarizes static containers, which usually give rise
both to a type-specific and to a generic class:
Static Containers
Collections
Sets
SortedSets
Functions
Maps†
SortedMaps
Lists
BigLists
Arrays†
BigArrays†
Heaps
SemiIndirectHeaps
IndirectHeaps
PriorityQueues†
IndirectPriorityQueues†
Iterators
BigListIterators
Spliterators
BigSpliterators
Comparators
Hash‡
HashCommon‡
SafeMath‡
†: this class has also a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
‡: this class has only a non-type-specific implementation in {@link it.unimi.dsi.fastutil}.
The static containers provide also special-purpose implementations for
all kinds of {@linkplain it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET
empty structures} (including {@linkplain
it.unimi.dsi.fastutil.objects.ObjectArrays#EMPTY_ARRAY arrays}) and
{@linkplain it.unimi.dsi.fastutil.ints.Int2IntMaps#singleton(int,int) singletons}.
Warnings
All classes are not synchronized. If multiple threads
access one of these classes concurrently, and at least one of the threads
modifies it, it must be synchronized externally. Iterators will behave
unpredictably in the presence of concurrent modifications. Reads, however,
can be carried out concurrently.
Reference-based classes violate the {@link java.util.Map}
contract. They intentionally compare objects by reference, and do
not use the equals()
method. They should be used only
when reference-based equality is desired (for instance, if all
objects involved are canonized, as it happens with interned strings).
Setting the default return value for maps with both non-primitive
keys and non-primitive values violates the {@link java.util.Map} contract, as you
might not get null
on missing keys.
Linked classes do not implement wholly the {@link
java.util.SortedMap} interface. They provide methods to get the
first and last element in iteration order, and to start a bidirectional
iterator from any element, but any submap or subset
method will cause an {@link java.lang.UnsupportedOperationException}
(this may change in future versions).
Substructures in sorted classes allow the creation of
arbitrary substructures. In {@link java.util}, instead, you
can only create contained sub-substructures (BTW, why?). For instance,
(new TreeSet()).tailSet(1).tailSet(0)
will throw an exception,
but {@link it.unimi.dsi.fastutil.ints.IntRBTreeSet (new
IntRBTreeSet()).tailSet(1).tailSet(0)} won't.
Additional Features and Methods
The new interfaces add some very natural methods and strengthen many of
the old ones. Moreover, whenever possible, the object returned is
type-specific, or implements a more powerful interface.
More in detail:
- Keys and values of a map are of the
fastutil
type you
would expect (e.g., the keys of an {@link
it.unimi.dsi.fastutil.ints.Int2LongSortedMap} are an {@link
it.unimi.dsi.fastutil.ints.IntSortedSet} and the values are a {@link
it.unimi.dsi.fastutil.longs.LongCollection}).
- Hash-based and tree-based maps that return primitive numeric values have an
addTo()
method (see, e.g., {@link it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap#addTo(int,int)})
that adds an increment to the current value of a key; it is
most useful to avoid the inefficient procedure of getting a value,
incrementing it and then putting it back into the map (typically, when
counting the number of occurrences of elements in a sequence).
- Hash-set implementations have an additional {@link it.unimi.dsi.fastutil.objects.ObjectOpenHashSet#get(Object) get()}
method that returns the actual object in the collection that is equal to the query key.
- Linked hash-based maps and sets have a wealth of additional methods that make it easy
to use them as caches. See, for instance, {@link it.unimi.dsi.fastutil.ints.Int2IntLinkedOpenHashMap#putAndMoveToLast(int,int)},
{@link it.unimi.dsi.fastutil.ints.IntLinkedOpenHashSet#addAndMoveToLast(int)}
or {@link it.unimi.dsi.fastutil.ints.Int2IntLinkedOpenHashMap#removeFirstInt()}.
- Submaps of a sorted map and subsets of a sorted sets are of the
fastutil
type you would expect, too.
- Iterators returned by
iterator()
are type-specific.
- Spliterators returned by
spliterator()
are type-specific.
- Collection has a method returning primitive streams, for example {@link it.unimi.dsi.fastutil.ints.IntCollection#intStream()}.
- Sorted structures in
fastutil
return
type-specific {@linkplain
it.unimi.dsi.fastutil.BidirectionalIterator bidirectional
iterators}. This means that you can move back and forth among
entries, keys or values.
- Some classes for maps (check the specification) return a fast entry set
(see, e.g., {@link it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap#int2IntEntrySet()});
fast entry sets can, in turn, provide a {@linkplain it.unimi.dsi.fastutil.ints.Int2IntMap.FastEntrySet#fastIterator() fast iterator}
that is guaranteed not to create a large number of objects, possibly by returning always the same entry (of course, mutated).
There's a also companion {@link it.unimi.dsi.fastutil.ints.Int2IntMap.FastEntrySet#fastForEach(java.util.function.Consumer) fastForEach()}.
To make it possible to access these fast iterators in
for
loops, there are static helper methods available
(e.g., {@link it.unimi.dsi.fastutil.ints.Int2IntMaps#fastIterable(it.unimi.dsi.fastutil.ints.Int2IntMap)}).
- The type-specific sorted set interfaces, moreover, feature an optional
method
iterator(from)
which creates a type-specific {@link
it.unimi.dsi.fastutil.BidirectionalIterator} starting from a given
element of the domain (not necessarily in the set). See, for instance,
{@link it.unimi.dsi.fastutil.ints.IntSortedSet#iterator(int)}. The method is
implemented by all type-specific sorted sets and subsets.
- Finally, there are constructors that allow you to build easily sets
using array and iterators. This means, for instance, that you can create
quickly a set of strings with a statement like
new ObjectOpenHashSet( new String[] { "foo", "bar" } )
or just "unroll" the integers returned by an iterator into a list with
new IntArrayList( iterator )
There are a few quirks, however, that you should be aware of:
- The versions of the {@link java.util.Map#get(Object)
get()}, {@link java.util.Map#put(Object,Object) put()} and
{@link java.util.Map#remove(Object) remove()} methods that
return a primitive type cannot, of course, rely on returning
null
to denote the absence of a certain
pair. Rather, they return a {@linkplain
it.unimi.dsi.fastutil.ints.Int2LongMap#defaultReturnValue(long) default
return value}, which is set to 0 cast to the
return type (false
for booleans) at creation, but
can be changed using the defaultReturnValue()
method (see, e.g., {@link
it.unimi.dsi.fastutil.ints.Int2IntMap}). Note that changing the
default return value does not change anything about the data
structure; it is just a way to return a reasonably meaningful
result—it can be changed at any time. For uniformity reasons,
even maps returning objects can use
defaultReturnValue()
(of course, in this case the
default return value is initialized to null
). A
submap or subset has an independent default return value (which
however is initialized to the default return value of the
originator).
- For all maps that have objects as keys, the {@link
java.util.Map#get(Object) get()} and {@link
java.util.Map#remove(Object) remove()} methods do not admit
polymorphic versions, as Java does not allow return-value
polymorphism. Rather, the extended interfaces introduce new
methods of the form {@link
it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap#getInt(Object)
getvaluetype()} and {@link
it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap#removeInt(Object)
removevaluetype()}. Similar problems occur with
{@link it.unimi.dsi.fastutil.chars.CharSortedSet#firstChar()
first()}, {@link
it.unimi.dsi.fastutil.chars.CharSortedSet#lastChar() last()},
and so on.
- Similarly, all iterators have a suitable method {@link
it.unimi.dsi.fastutil.ints.IntIterator#nextInt()
nexttype()} returning directly a primitive type.
And, of course, you have a type-specific version of {@link
java.util.ListIterator#previous() previous()}. Note that the “for each”
style of iteration can hide boxing and unboxing: even if you iterate
using a primitive variable (as in {@code for(long x: s)}, where {@code s}
is a {@link it.unimi.dsi.fastutil.longs.LongSet}), the actual iterator used will be
object-based, as Java knows nothing about
fastutil
's
type-specific {@code next()}/{@code previous()} methods.
- For the same reason, the method {@link java.util.Collection#toArray}
as a polymorphic version accepting a type-specific array, but there is
also an explicitly typed method
{@link it.unimi.dsi.fastutil.bytes.ByteCollection#toByteArray()
tokeytypeArray()}.
- The standard entry-set iterators for hash-based maps use an entry object
that refers to the data contained in the hash table. If you retrieve an
entry and delete it, the entry object will become invalid and will throw
an {@link java.lang.ArrayIndexOutOfBoundsException} exception. This does not
apply to fast iterators (see above).
- A name clash between the list and collection interfaces
forces the deletion method of a collection to be named {@link
it.unimi.dsi.fastutil.doubles.DoubleCollection#rem(double)
rem()}. At the risk of creating some confusion, {@link
it.unimi.dsi.fastutil.doubles.DoubleSet#remove(double) remove()}
reappears in the type-specific set interfaces, so the only
really unpleasant effect is that you must use
rem()
on variables that are collections, but not
sets—for instance, {@linkplain
it.unimi.dsi.fastutil.ints.IntList type-specific lists}, and
that a subclass of a type-specific abstract collection must
override rem()
, rather than remove()
, to
make all inherited methods work properly.
- Stacks are interfaces implemented by array-based
lists: the interface, moreover, is slightly different from the
implementation contained in {@link java.util.Stack}.
- In JDK 8 a number of classes appeared specifying primitive-type
interfaces: examples are {@link java.util.PrimitiveIterator} and
{@link java.util.function.IntToLongFunction}. Whenever possible,
fastutil
tries to extend such interfaces (e.g.,
{@link it.unimi.dsi.fastutil.ints.IntIterator} extends
{@link java.util.PrimitiveIterator.OfInt}).
- In the specification of new methods such as
{@link java.util.Map#computeIfAbsent(Object,java.util.function.Function)
Map.computeIfAbsent()} we do not always have a suitable type-specific
version of {@link java.util.function.Function} in the JDK: in this case,
we approximate the best we can. For example, a
{@link it.unimi.dsi.fastutil.bytes.Byte2ByteMap} expects an
{@link java.util.function.IntUnaryOperator}. The default implementation
will check that the returned argument actually fits a byte, throwing an
exception otherwise. Note that we always specify also a version
accepting a function returning an object, as it is useful in case
one needs to return {@code null}.
- In the specification of new methods requiring a type-specific
{@link java.util.function.Consumer} such as {@link java.lang.Iterable#forEach(java.util.function.Consumer)}
we try to use the {@link java.util.function} correct type, if available. Otherwise, we use
the
fastutil
correct type-specific version, but, as in the previous case,
the fastutil
version extends the closest {@link java.util.function.Consumer}.
For example, {@link it.unimi.dsi.fastutil.chars.CharConsumer} extends
{@link java.util.function.IntConsumer}.
Functions
{@link it.unimi.dsi.fastutil.Function} (and its type-specific versions) is an
interface geared towards mathematical functions (e.g., hashes) which associates
values to keys, but in which enumerating keys or values is not possible. It is essentially
a {@link java.util.Map} that does not provide access to set representations. It is of course
unfortunate that Java 8 introduced an identically named interface with a different signature:
differences and interoperability issues are discussed in the
{@linkplain it.unimi.dsi.fastutil.Function class documentation}.
fastutil
provides interfaces, abstract implementations and the usual array of wrappers
in the suitable static container (e.g., {@link it.unimi.dsi.fastutil.ints.Int2IntFunctions}).
Implementations will be provided by other projects (e.g., Sux4J).
Type-specific functions require just to define their {@code get()} methods: thus, they can be defined
by lambda expressions.
All fastutil
type-specific maps extend their respective type-specific
functions: but, alas, we cannot have {@link java.util.Map} extending
{@link it.unimi.dsi.fastutil.Function}. Moreover, type-specific
functions extend the existing type-specific function from {@link
java.util.function} that can accommodate the argument and the result with the least
widening: for example, {@link
it.unimi.dsi.fastutil.shorts.Short2CharFunction} extends {@link
java.util.function.IntUnaryOperator}.
Static Container Classes
fastutil
provides a number of static methods and
singletons, much like {@link java.util.Collections}. To avoid creating
classes with hundreds of methods, there are separate containers for
sets, lists, maps and so on. Generic containers are placed in {@link
it.unimi.dsi.fastutil}, whereas type-specific containers are in the
appropriate package. You should look at the documentation of the
static classes contained in {@link it.unimi.dsi.fastutil}, and in
type-specific static classes such as {@link
it.unimi.dsi.fastutil.chars.CharSets}, {@link
it.unimi.dsi.fastutil.floats.Float2ByteSortedMaps}, {@link
it.unimi.dsi.fastutil.longs.LongArrays}, {@link
it.unimi.dsi.fastutil.floats.FloatHeaps}. Presently, you can easily
obtain {@linkplain it.unimi.dsi.fastutil.objects.ObjectSets#EMPTY_SET empty collections},
{@linkplain it.unimi.dsi.fastutil.longs.Long2IntMaps#EMPTY_MAP empty
type-specific collections}, {@linkplain
it.unimi.dsi.fastutil.ints.IntLists#singleton(int) singletons},
{@linkplain
it.unimi.dsi.fastutil.objects.Object2ReferenceSortedMaps#synchronize(Object2ReferenceSortedMap)
synchronized versions} of any type-specific container and
unmodifiable versions of {@linkplain
it.unimi.dsi.fastutil.objects.ObjectLists#unmodifiable(ObjectList)
containers} and {@linkplain
it.unimi.dsi.fastutil.ints.IntIterators#unmodifiable(IntBidirectionalIterator) iterators} (of course,
unmodifiable containers always return unmodifiable iterators).
On a completely different side, the {@linkplain
it.unimi.dsi.fastutil.ints.IntArrays type-specific static container
classes for arrays} provide several useful methods that allow to treat
an array much like an array-based list, hiding completely the growth
logic. In many cases, using this methods and an array is even simpler
then using a full-blown {@linkplain
it.unimi.dsi.fastutil.doubles.DoubleArrayList type-specific
array-based list} because elements access is syntactically much
simpler. The version for objects uses reflection to return arrays of
the same type of the argument.
For the same reason, fastutil
provides a full
implementation of methods that manipulate arrays as type-specific
{@linkplain it.unimi.dsi.fastutil.ints.IntHeaps heaps}, {@linkplain
it.unimi.dsi.fastutil.ints.IntSemiIndirectHeaps semi-indirect heaps} and
{@linkplain it.unimi.dsi.fastutil.ints.IntIndirectHeaps indirect heaps}. There are
also quicksort and mergesort implementations that use arbitrary type-specific comparators.
fastutil
offers also a less common choice—a very tuned
implementation of {@linkplain it.unimi.dsi.fastutil.ints.IntArrays#radixSort(int[],int,int) radix sort} for
all primitive types. It is significantly faster than quicksort already at small sizes (say, more than 10000 elements), and should
be considered the sorting algorithm of choice if you do not need a generic comparator.
There are several variants provided. First of all you can radix sort in parallel
{@linkplain it.unimi.dsi.fastutil.ints.IntArrays#radixSort(int[],int[], int, int) two} or
{@linkplain it.unimi.dsi.fastutil.ints.IntArrays#radixSort(int[][],int,int) even more} arrays. You
can also perform {@linkplain it.unimi.dsi.fastutil.ints.IntArrays#radixSortIndirect(int[],int[],int,int,boolean) indirect} sorts,
for instance if you want to compute the sorting permutation of an array.
The sorting algorithm is a tuned radix sort adapted from Peter M. McIlroy, Keith Bostic and M. Douglas
McIlroy, “Engineering radix sort”, Computing Systems, 6(1), pages 5−27 (1993),
and further improved using the digit-oracle idea described by
Juha Kärkkäinen and Tommi Rantala in “Engineering radix sort for strings”,
String Processing and Information Retrieval, 15th International Symposium, volume 5280 of
Lecture Notes in Computer Science, pages 3−14, Springer (2008). The basic algorithm is not
stable, but this is immaterial for arrays of primitive types. For the indirect case, there is a parameter
specifying whether the algorithm should be stable.
Iterators and Comparators
fastutil
provides type-specific iterators and
comparators. The interface of a fastutil
iterator is
slightly more powerful than that of a {@link java.util} iterator, as
it contains a {@link it.unimi.dsi.fastutil.objects.ObjectIterator#skip(int)
skip()} method that allows to skip over a list of elements (an
{@linkplain
it.unimi.dsi.fastutil.objects.ObjectBidirectionalIterator#back(int) analogous
method} is provided for bidirectional iterators). For objects (even
those managed by reference), the extended interface is named {@link
it.unimi.dsi.fastutil.objects.ObjectIterator}; it is the return type, for
instance, of {@link
it.unimi.dsi.fastutil.objects.ObjectCollection#iterator()}.
A plethora of useful static methods is also provided by various
type-specific static containers (e.g., {@link
it.unimi.dsi.fastutil.ints.IntIterators}) and {@link
it.unimi.dsi.fastutil.ints.IntComparators}: among other things, you can
{@linkplain it.unimi.dsi.fastutil.ints.IntIterators#wrap(int[]) wrap
arrays} and {@linkplain
it.unimi.dsi.fastutil.ints.IntIterators#asIntIterator(java.util.Iterator)
standard iterators} in type-specific iterators, {@linkplain
it.unimi.dsi.fastutil.ints.IntIterators#fromTo(int,int) generate them}
giving an interval of elements to be returned, {@linkplain
it.unimi.dsi.fastutil.objects.ObjectIterators#concat(ObjectIterator[])
concatenate them} or {@linkplain
it.unimi.dsi.fastutil.objects.ObjectIterators#pour(Iterator,ObjectCollection)
pour them} into a set.
Spliterators and Streams
fastutil
provides type-specific {@linkplain java.util.Spliterator spliterators}. These are
used to implement fast streams.
Due to Java only supporting primitive Streams of int
, long
,
and double
,
types for primitives narrower then that (such as {@linkplain
it.unimi.dsi.fastutil.bytes.ByteCollection ByteCollection}) have methods
that expose a spliterator widening them to the type supported by the
Stream API. For example, {@linkplain it.unimi.dsi.fastutil.bytes.ByteCollection ByteCollection}
will have an {@link it.unimi.dsi.fastutil.bytes.ByteCollection#intSpliterator intSpliterator()}
method in addition to a spliterator()
method that works in terms of
bytes.
Due to the incompatibility of object based {@linkplain java.util.stream.Stream streams} and the
primitive streams (such as {@link java.util.stream.IntStream}), the stream()
methods of the primitive containers could not be updated to return their
primitive equivalents. Instead, for primitive collections, a new method is
provided that exposes a primitive stream, and the object based stream method
is marked deprecated like the other boxing methods do.
For example, {@linkplain it.unimi.dsi.fastutil.bytes.ByteCollection},
{@linkplain it.unimi.dsi.fastutil.shorts.ShortCollection}, and
{@linkplain it.unimi.dsi.fastutil.ints.IntCollection} exposes an
{@link it.unimi.dsi.fastutil.ints.IntCollection#intStream() intStream()} method, which should
be used in preference to
stream()
as it will not box and unbox.
Since type-specific {@link java.util.stream.Collector collectors} would be
more than 700, fastutil
implements basic collection
methods in each container (e.g., {@link
it.unimi.dsi.fastutil.ints.IntImmutableList#toList(java.util.stream.IntStream)
IntImmutableList.toList()}). These methods must be applied to a stream
to obtain a collection containing the output of the stream.
Similar to iterators, a raft of utility methods for type-specific spliterators are provided
(e.g. {@link it.unimi.dsi.fastutil.ints.IntSpliterators}). Most of them are analogues of the
utilities in the type-specific Iterators
class.
Currently no utility methods are provided for type-specific streams, as the
Java library's API already has good coverage for that (e.g. the static methods of
{@link java.util.stream.Stream} and their primitive equivalents, or classes such as
{@link java.util.stream.StreamSupport}).
Functions, Consumers, Predicates, and Operators
Classes such as {@link java.util.function.IntBinaryOperator} were introduced in Java 8
to obtain performance improvements similar to those obtained by fastutil
. Unfortunately,
the design of all these classes follows an opposite choice with respect to fastutil
's:
type-specific classes do not extend the generic ones. Moreover, type specificity is rather limited
(e.g., there are no short-specific classes). The fact that type-specific classes do not extend
generic ones makes it difficult to use lambdas, as often they are (rightly) considered
ambiguous by the compiler. The solution implemented in fastutil
8.5.0 is to have
an internal type-specific version of these interfaces which extends both the generic one
and the JDK type-specific one, allowing also for type widening (e.g., {@link it.unimi.dsi.fastutil.chars.CharConsumer}
extends {@link java.util.function.IntConsumer}). When definining lambdas, the compiler now
has a more specific type to choose (the type-specific fastutil
interface), and
no ambiguity arises.
Queues
fastutil
offers two types of queues: direct
queues and indirect queues. A direct queue offers type-specific method to {@linkplain
it.unimi.dsi.fastutil.longs.LongPriorityQueue#enqueue(long) enqueue} and
{@linkplain it.unimi.dsi.fastutil.longs.LongPriorityQueue#dequeueLong()
dequeue} elements. An indirect queue needs a reference array,
specified at construction time: {@linkplain
it.unimi.dsi.fastutil.IndirectPriorityQueue#enqueue(int) enqueue} and
{@linkplain it.unimi.dsi.fastutil.IndirectPriorityQueue#dequeue()
dequeue} operations refer to indices in the reference array. The advantage
is that it may be possible to {@linkplain
it.unimi.dsi.fastutil.IndirectPriorityQueue#changed(int) notify the change}
of any element of the reference array, or even to {@linkplain
it.unimi.dsi.fastutil.IndirectPriorityQueue#remove(int) remove an arbitrary
element}.
Queues have two kind of implementations: array-based
implementations, and heap-based implementations. In particular, heap-based
indirect queues may be {@linkplain
it.unimi.dsi.fastutil.objects.ObjectHeapIndirectPriorityQueue fully
indirect} or just {@linkplain
it.unimi.dsi.fastutil.objects.ObjectHeapSemiIndirectPriorityQueue
semi-indirect}: in the latter case, there is no need for an explicit
indirection array (which saves one integer per queue entry), but not all
operations will be available. Note there there are also
{@linkplain it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue FIFO queues}.
Custom Hashing
Sometimes, the behaviour of the built-in equality and hashing methods is
not what you want. In particular, this happens if you store in a hash-based
collection arrays, and you would like to compare them by equality. For this kind of applications,
fastutil
provides {@linkplain it.unimi.dsi.fastutil.Hash.Strategy custom hash strategies},
which define new equality and hashing methods to be used inside the collection. There are even
{@linkplain it.unimi.dsi.fastutil.ints.IntArrays#HASH_STRATEGY ready-made strategies} for arrays. Note, however,
that fastutil
containers do not cache hash codes, so custom hash strategies must be efficient.
Abstract Classes
fastutil
provides a wide range of abstract classes, to
help in implementing its interfaces. They take care, for instance, of
providing wrappers for non-type-specific method calls, so that you have to
write just the (usually simpler) type-specific version. When possible,
such wrappers are actually default methods, so no abstract class is
necessary.
More on the support for very large collections
With the continuous increase in core memory available, Java arrays are starting to show
their size limitation (indices cannot be larger than 231). fastutil
proposes to store big arrays using arrays-of-arrays subject to certain
size restrictions and a number of supporting static methods. Please read the documentation
of {@link it.unimi.dsi.fastutil.BigArrays} to understand how big arrays work.
Correspondingly, fastutil
proposes a new interface, called
{@link it.unimi.dsi.fastutil.Size64}, that should be implemented by very large
collections. {@link it.unimi.dsi.fastutil.Size64} contains a method
{@link it.unimi.dsi.fastutil.Size64#size64()} which returns the collection
size as a long integer.
fastutil
provides {@linkplain it.unimi.dsi.fastutil.BigList big lists},
which are lists with 64-bit indices; of course, they implement {@link it.unimi.dsi.fastutil.Size64}.
An implementation based on big arrays is provided (see, e.g., {@link it.unimi.dsi.fastutil.ints.IntBigArrayBigList}),
as well as static containers (see, e.g., {@link it.unimi.dsi.fastutil.ints.IntBigLists}).
Whereas it is unlikely that such collection will be in main memory as big arrays, there
are number of situations, such as exposing large files through a list interface or
storing a large amount of data using succinct data structures,
in which a big list interface is natural.
Unfortunately, {@linkplain java.util.List lists} and {@linkplain it.unimi.dsi.fastutil.BigList big lists},
as well as {@linkplain java.util.ListIterator list iterators} and {@linkplain it.unimi.dsi.fastutil.BigListIterator big-list iterators},
cannot be made compatible: we thus provide adapters (see, e.g., {@link it.unimi.dsi.fastutil.ints.IntBigLists#asBigList(it.unimi.dsi.fastutil.ints.IntList)}).
Finally, fastutil
provides {@linkplain it.unimi.dsi.fastutil.longs.LongOpenHashBigSet big hash sets}, which
are based on big arrays. They are about 30% slower than non-big sets, but their size is limited only by
the amount core memory.
More on fast and practical I/O
fastutil
includes an {@linkplain
it.unimi.dsi.fastutil.io I/O package} that provides, for instance, {@linkplain
it.unimi.dsi.fastutil.io.FastBufferedInputStream fast, unsynchronized
buffered input streams}, {@linkplain
it.unimi.dsi.fastutil.io.FastBufferedOutputStream fast, unsynchronized
buffered output streams}, and a wealth of static methods to store and
retrieve data in {@linkplain it.unimi.dsi.fastutil.io.TextIO textual} and
{@linkplain it.unimi.dsi.fastutil.io.BinIO binary} form. The latter, in particular,
contain methods that load and store big arrays.
Performance
The main reason behind fastutil
is performance, both in
time and in space. The relevant methods of type-specific hash maps and sets
are something like 2 to 10 times faster than those of the standard
classes. Note that performance of hash-based classes on object keys is
usually slightly worse than that of
{@link java.util}, because fastutil
classes do not cache hash
codes (albeit it will not be that bad if keys cache internally hash codes,
as in the case of {@link java.lang.String}). Of course, you can try to get
more speed from hash tables using a small load factor: to this purpose,
alternative load factors are proposed in {@link it.unimi.dsi.fastutil.Hash#FAST_LOAD_FACTOR}
and {@link it.unimi.dsi.fastutil.Hash#VERY_FAST_LOAD_FACTOR}.
For tree-based classes you have two choices: AVL and red-black
trees. The essential difference is that AVL trees are more balanced (their
height is at most 1.44 log n), whereas red-black trees have
faster deletions (but their height is at most 2 log n). So on
small trees red-black trees could be faster, but on very large sets AVL
trees will shine. In general, AVL trees have slightly slower updates but
faster searches; however, on very large collections the smaller height may
lead in fact to faster updates, too.
fastutil
reduces enormously the creation and collection of
objects. First of all, if you use the polymorphic methods and iterators no
wrapper objects have to be created. Moreover, since fastutil
uses open-addressing hashing techniques, creation and garbage collection of
hash-table entries are avoided (but tables have to be rehashed whenever
they are filled beyond the load factor). The major reduction of the number
of objects around has a definite (but very difficult to measure) impact on
the whole application (as garbage collection runs proportionally to the
number of alive objects).
Maps whose iteration is very expensive in terms of object creation (e.g., hash-based classes) usually
return a type-specific {@link it.unimi.dsi.fastutil.ints.Int2IntMap.FastEntrySet FastEntrySet}
whose {@link it.unimi.dsi.fastutil.ints.Int2IntMap.FastEntrySet#fastIterator() fastIterator()}
method significantly reduces object creation by returning always
the same entry object, suitably mutated.
Memory Usage
The absence of wrappers makes data structures in fastutil
much smaller: even in the case of objects, however, data structures in
fastutil
try to be space-efficient.
Hash Tables
To avoid memory waste, (unlinked) hash tables in
fastutil
keep no additional information about elements
(such as a list of keys). In particular, this means that enumerations
are always linear in the size of the table (rather than in the number
of keys). Usually, this would imply slower iterators. Nonetheless, the
iterator code includes a single, tight loop; moreover, it is possible
to avoid the creation of wrappers. These two facts make in practice
fastutil
iterators faster than {@link
java.util}'s.
The memory footprint for a table of length ℓ is exactly the
memory required for the related types times ℓ. The
absence of wrappers around primitive types can reduce space occupancy by
several times (this applies even more to serialized data, e.g., when you
save such a data structure in a file). These figures can greatly vary with
your virtual machine, JVM versions, CPU etc.
More precisely, when you ask for a map that will hold n
elements with load factor 0 < f < 1,
2⌈log n / f⌉
entries are allocated. When the table is filled up beyond the load factor, it is rehashed
doubling its size. When it is emptied below one fourth of the load factor, it
is rehashed halving its size; however, a map is never reduced to a
size smaller than that at creation time: this approach makes it
possible to create maps with a large capacity in which insertions and
deletions do not cause immediately rehashing.
In the case of linked hash tables, there is an additional vector of
2⌈log n / f⌉ longs that is used to store link information. Each
element records the next and previous element (packed together so to be more cache friendly).
Balanced Trees
The balanced trees implementation is also very parsimonious.
fastutil
is based on the excellent (and unbelievably well
documented) code contained in Ben Pfaff's GNU libavl, which describes in
detail how to handle balanced trees with threads. Thus, the
overhead per entry is two pointers and one integer, which compares well to
three pointers plus one boolean of the standard tree maps. The trick is
that we use the integer bit by bit, so we consume two bits to store thread
information, plus one or two bits to handle balancing. As a result, we get
bidirectional iterators in constant space and amortized constant time
without having to store references to parent nodes.
It should be mentioned that all tree-based classes have a fixed overhead
for some arrays that are used as stacks to simulate recursion; in
particular, we need 48 booleans for AVL trees and 64 pointers plus 64
booleans for red-black trees.
Examples
To store a set of longs, we can use a hash-based container:
LongSet s = new LongOpenHashSet();
Access methods avoid boxing and unboxing:
s.add(1);
s.contains(2);
We can obtain a type-specific iterator on the elements of the set:
// Sum all elements
long t = 0;
for(LongIterator i = s.iterator(); i.hasNext();) t += i.nextLong();
Note that “for each” iteration must be avoided:
long t = 0;
for(long x: s) t += x;
In the loop above, boxing and unboxing is happening (even if your IDE does not report it).
In some cases, a solution is to use a type-specific {@link java.util.Collection#forEach(java.util.function.Consumer) forEach()}:
// Print all elements
s.forEach(x -> System.out.println(x));
Or we can use fastutil
's type-specific version of Java 8's streams:
long t = m.longStream().sum();
Suppose instead you want to store a sorted map from longs to integers. We will use a tree of AVL type:
Long2IntSortedMap m = new Long2IntAVLTreeMap();
Now we can easily modify and access its content:
m.put(1, 5);
m.put(2, 6);
m.put(3, 7);
m.put(1000000000L, 10);
m.get(1); // This method call will return 5
m.get(4); // This method call will return 0
We can also try to change the default return value:
m.defaultReturnValue(-1);
m.get(4); // This method call will return -1
We can obtain a type-specific iterator on the key set:
LongBidirectionalIterator i = m.keySet().iterator();
// Now we sum all keys
long s = 0;
while(i.hasNext()) s += i.nextLong();
Or we can use again fastutil
's type-specific version of Java 8's streams:
long s = m.longStream().sum();
We now generate a head map, and iterate bidirectionally over it starting
from a given point:
// This map contains only keys smaller than 4
Long2IntSortedMap m1 = m.headMap(4);
// This iterator is positioned between 2 and 3
LongBidirectionalIterator t = m1.keySet().iterator(2);
t.previous(); // This method call will return 2 (t.next() would return 3)
Should we need to access the map concurrently, we can wrap it:
// This map can be safely accessed by many threads
Long2IntSortedMap m2 = Long2IntSortedMaps.synchronize(m1);
Linked maps are very flexible data structures which can be used to implement, for
instance, queues whose content can be probed efficiently:
// This map remembers insertion order.
IntSortedSet s = new IntLinkedOpenHashSet.of(4, 3, 2, 1);
s.firstInt(); // This method call will return 4
s.lastInt(); // This method call will return 1
s.contains(5); // This method will return false
IntBidirectionalIterator i = s.iterator(s.lastInt()); // We could even cast it to a list iterator
i.previous(); // This method call will return 1
i.previous(); // This method call will return 2
s.remove(s.lastInt()); // This will remove the last element in constant time
Now, we play with iterators. It is easy to create iterators over
intervals or over arrays, and combine them:
IntIterator i = IntIterators.fromTo(0, 10); // This iterator will return 0, 1, ..., 9
int[] a = new int[] { 5, 1, 9 };
IntIterator j = IntIterators.wrap(a); // This iterator will return 5, 1, 9.
IntIterator k = IntIterators.concat(new IntIterator[] { i , j }); // This iterator will return 0, 1, ..., 9, 5, 1, 9
It is easy to build lists and sets on the fly using the {@code of} static factory methods.
For maps you can use the constructors that take key and value arrays (array based constructors for list and set exist too):
IntSet s = IntOpenHashSet.of(1, 2, 3); // This set will contain 1, 2, and 3
Char2IntMap m = new Char2IntRBTreeMap(new char[] { '@', '-' }, new int[] { 0, 1 }); // This map will map '@' to 0 and '-' to 1
Whenever you have some data structure, it is easy to serialize it in an
efficient (buffered) way, or to dump their content in textual form:
BinIO.storeObject(s, "foo"); // This method call will save s in the file named "foo"
TextIO.storeInts(s.intIterator(), "foo.txt"); // This method call will save the content of s in ASCII
i = TextIO.asIntIterator("foo.txt"); // This iterator will parse the file and return the integers therein
Support for Java 8 primitive streams is included for primitive collections (e.g. {@code intStream}),
which will work in terms of primitives instead of boxing to wrapper types like the regular {@code stream} would do:
IntList l = IntList.of(2, 380, 1297);
int lSum = l.intStream().sum(); // Will be 1679
IntList lTransformed = IntArrayList.toList(l.intStream().map(i -> i + 40)); // Will be 42, 420, 1337
You can sort arrays using type-specific comparators specified by lambda expressions (no boxing/unboxing here):
IntArrays.quickSort(a, (x, y) -> Integer.compare(y, x)); // Sorts in reverse order
You can also easily specify complex generic sorting, like sorting indirectly on {@code a} while swapping elements in {@code a} and {@code b} in parallel:
Arrays.quickSort(0, a.length, (i, j) -> Integer.compare(a[i], a[j]), (i, j) -> { IntArrays.swap(a, i, j); IntArrays.swap(b, i, j); }));
If you have several cores, you can do it in parallel:
IntArrays.parallelQuickSort(a, (x, y) -> y - x); // Sorts in reverse order
Arrays.parallelQuickSort(0, a.length, (i, j) -> Integer.compare(a[i], a[j]), (i, j) -> { IntArrays.swap(a, i, j); IntArrays.swap(b, i, j); }));
Some maps provide a fast iterator on their entry set: such iterators are allowed to reuse the {@link java.util.Map.Entry} instance
they return, resulting is highly reduced garbage collection (e.g., for large hash maps). To easily access such iterators, we can use
a helper static method:
Int2IntOpenHashMap m = new Int2IntOpenHashMap();
// Fill the map
for (Int2IntMap.Entry e : Int2IntMaps.fastIterable(m)) {
// do things with e.getIntKey() and e.getIntValue();
}
The code above will not generate an object per enumerated item, as for(Entry<Integer,Integer> e: m.entrySet())
(or even for(Int2IntMap.Entry e: m.int2IntKeySet())
) would do.