cern.colt.list.package.html Maven / Gradle / Ivy
Show all versions of parallelcolt Show documentation
Resizable lists holding objects or primitive data types such as int,
double, etc. For non-resizable lists (1-dimensional matrices) see
package {@link cern.colt.matrix}.
Getting Started
1. Overview
The list package offers flexible object oriented abstractions modelling dynamically
resizing lists holding objects or primitive data types such as int,
double, etc. It is designed to be scalable in terms of performance
and memory requirements.
Features include:
- Lists operating on objects as well as all primitive data types such as int,
double, etc.
- Compact representations
- A number of general purpose list operations including: adding, inserting,
removing, iterating, searching, sorting, extracting ranges and copying. All
operations are designed to perform well on mass data.
- Support for quick access to list elements. This is achieved by bounds-checking
and non-bounds-checking accessor methods as well as zero-copy transformations
to primitive arrays such as int[], double[], etc.
- Allows to use high level algorithms on primitive data types without any
space and time overhead. Operations on primitive arrays, Colt lists and JAL
algorithms can freely be mixed at zero copy overhead.
File-based I/O can be achieved through the standard Java built-in serialization
mechanism. All classes implement the {@link java.io.Serializable} interface.
However, the toolkit is entirely decoupled from advanced I/O. It provides data
structures and algorithms only.
This toolkit borrows concepts and terminology from the Javasoft
Collections framework written by Josh Bloch and introduced in JDK 1.2.
2. Introduction
Lists are fundamental to virtually any application. Large scale resizable lists
are, for example, used in scientific computations, simulations database management
systems, to name just a few.
A list is a container holding elements that can be accessed via zero-based
indexes. Lists may be implemented in different ways (most commonly with arrays).
A resizable list automatically grows as elements are added. The lists of this
package do not automatically shrink. Shrinking needs to be triggered by explicitly
calling trimToSize() methods.
Growing policy: A list implemented with arrays initially has a certain
initialCapacity - per default 10 elements, but customizable upon instance
construction. As elements are added, this capacity may nomore be sufficient.
When a list is automatically grown, its capacity is expanded to 1.5*currentCapacity.
Thus, excessive resizing (involving copying) is avoided.
Copying
Any list can be copied. A copy is equal to the original but entirely
independent of the original. So changes in the copy are not reflected in the
original, and vice-versa.
3. Organization of this package
Class naming follows the schema <ElementType><ImplementationTechnique>List.
For example, we have a {@link cern.colt.list.tdouble.DoubleArrayList}, which is a list
holding double elements implemented with double[] arrays.
The classes for lists of a given value type are derived from a common abstract
base class tagged Abstract<ElementType>List. For example,
all lists operating on double elements are derived from {@link cern.colt.list.tdouble.AbstractDoubleList},
which in turn is derived from an abstract base class tying together all lists
regardless of value type, {@link cern.colt.list.AbstractList}, which finally
is rooted in grandmother {@link cern.colt.list.AbstractCollection}. The abstract
base classes provide skeleton implementations for all but few methods. Experimental
data layouts (such as compressed, sparse, linked, etc.) can easily be implemented
and inherit a rich set of functionality. Have a look at the javadoc tree
view to get the broad picture.
4. Example usage
The following snippet fills a list, randomizes it, extracts the first half
of the elements, sums them up and prints the result. It is implemented entirely
with accessor methods.
int s = 1000000;
AbstractDoubleList list = new DoubleArrayList();
for (int i=0; i<s; i++) { list.add((double)i); }
list.shuffle();
AbstractDoubleList part = list.partFromTo(0,list.size()/2 - 1);
double sum = 0.0;
for (int i=0; i<part.size(); i++) { sum += part.get(i); }
System.out.println(sum);
For efficiency, all classes provide back doors to enable getting/setting the
backing array directly. In this way, the high level operations of these classes
can be used where appropriate, and one can switch to []-array index
notations where necessary. The key methods for this are public <ElementType>[]
elements() and public void elements(<ElementType>[]). The
former trustingly returns the array it internally keeps to store the elements.
Holding this array in hand, we can use the []-array operator to
perform iteration over large lists without needing to copy the array or paying
the performance penalty introduced by accessor methods. Alternatively any JAL
algorithm (or other algorithm) can operate on the returned primitive array.
The latter method forces a list to internally hold a user provided array. Using
this approach one can avoid needing to copy the elements into the list.
As a consequence, operations on primitive arrays, Colt lists and JAL algorithms
can freely be mixed at zero-copy overhead.
Note that such special treatment certainly breaks encapsulation. This functionality
is provided for performance reasons only and should only be used when absolutely
necessary. Here is the above example in mixed notation:
int s = 1000000;
DoubleArrayList list = new DoubleArrayList(s); // list.size()==0, capacity==s
list.setSize(s); // list.size()==s
double[] values = list.elements(); // zero copy, values.length==s
for (int i=0; i<s; i++) { values[i]=(double)i; }
list.shuffle();
double sum = 0.0;
int limit = values.length/2;
for (int i=0; i<limit; i++) { sum += values[i]; }
System.out.println(sum);
Or even more compact using lists as algorithm objects:
int s = 1000000;
double[] values = new double[s];
for (int i=0; i<s; i++) { values[i]=(double)i; }
new DoubleArrayList(values).shuffle(); // zero-copy, shuffle via back door
double sum = 0.0;
int limit = values.length/2;
for (int i=0; i<limit; i++) { sum += values[i]; }
System.out.println(sum);
5. Notes
The quicksorts and mergesorts are the JDK 1.2 V1.26 algorithms, modified as
necessary to operate on the given data types.