com.firefly.utils.collection.HashedArrayTree Maven / Gradle / Ivy
package com.firefly.utils.collection;
/***************************************************************************
* File: HashedArrayTree.java
* Author: Keith Schwarz ([email protected])
*
* An implementation of the List abstraction backed by a hashed array tree
* (HAT), a data structure supporting amortized O(1) lookup, append, and
* last-element removal. In this sense it is akin to a standard dynamic
* array implementation. However, a hashed array tree also has the advantage
* that its memory overhead is only O(sqrt(n)) rather than the typical O(n)
* found in dynamic arrays and their variants.
*
* Internally, the hashed array tree is implemented as an array of pointers
* that optionally point to an array of elements. The topmost array and each
* element always have the same size, which is always a power of two. In
* this sense, the hashed array tree is essentially a two-dimensional array
* of elements. However, the advantage of the hashed array tree is that the
* topmost array pointers are all initially null and only filled in when space
* is needed. This means that the maximum overhead of the structure is the
* size of the topmost array, plus the number of unused elements in the current
* block. Here is one sample HAT:
*
* [ ] [ ] [ ] [ ]
* | | |
* v v v
* [0] [4] [8]
* [1] [5] [9]
* [2] [6] [ ]
* [3] [7] [ ]
*
* Here, the topmost array of pointers has three pointers in use, each of which
* point to an array of the corresponding number of elements.
*
* Whenever an element is added to a hashed array tree, one of three cases
* must hold:
*
* 1. There is extra space at the end of the final subarray (for example, in
* the top picture). In that case, the element is added to that position.
* 2. The final subarray is full, but space for another subarray exists in the
* topmost array. In that case, a new array is allocated and the element
* is added to that array.
* 3. The final subarray is full and no new arrays remain open. In that case,
* since the topmost array has size 2^n and each array has size 2^n, there
* must be a total of 2^(2n) elements in the hashed array tree. We
* next double the size of the topmost array to 2^(n + 1), then allocate
* 2^(n - 1) subarrays of size 2^(n + 1) elements for a total of 2^(2n)
* elements' worth of space. The elements from the old HAT are then copied
* over, a new array is allocated for the new element, and it is added as
* the first element of that array.
*
* Let's now talk about the performance and memory usage of this structure.
* First, we note that we can perform lookups in O(1), assuming that the
* machine is transdichotomous (meaning that a single machine word can hold
* the size of any array). This can be done by breaking the input index in
* half, then using the first half to choose which array to look into (the
* "hashed" part of HAT) and the second half to choose which index to select.
* This trick is similar to the trick used to represent a two-dimensional
* array using a linear structure.
*
* Next, let's think about how much time it takes to do an append operation.
* Each append when space remains takes O(1) to look up the proper position
* in the hashed array tree for writing, but appends can be much more expensive
* when the HAT needs to be resized. Fortunately, this does not happen very
* often. Whenever the HAT doubles in size, its capacity grows from 2^(2n) to
* 2^(2n + 2), meaning that four times as many elements must be inserted before
* the next copy operation. If we define a potential function as twice the
* number of filled-in elements in the HAT above half capacity (i.e. the
* number of elements in the arrays in the second half), then we can prove an
* amortized O(1) time for append. Consider any series of appends. If the
* append does not expand the HAT, then it takes O(1) time and increases the
* potential by 1/2. If the append does expand the HAT, and the HAT's topmost
* array has size 2^n, then there must be 2^(n-1) elements in the latter half
* of the HAT, so the potential is 2^(2n). The time required to move each
* of the 2^(2n) elements is 2^(2n), and in the new HAT there are no elements
* in the latter half of the array. Consequently, the new potential is zero.
* The actual time required to perform the append is thus 2^2n + O(1), and
* the decrease in potential is -2^2n, so the amortized cost is O(1) as
* expected.
*
* Last, let's talk about the cost to do a remove. This is similar to the
* append case - we delete the last element of the last array, removing the
* array from the topmost array if it becomes empty. We also compact the
* HAT if it becomes too sparse by shrinking from a HAT of size 2^(2n + 2) to
* a HAT of size 2^(2n) if the HAT becomes one-eighth full. A similar
* potential method can be used to show that this operation runs in amortized
* O(1).
*
* Finally, let's consider the memory overhead of the HAT. For any HAT of
* topmost array size 8 or more, since the HAT is always at least one-eighth
* full, there must be at least one full array. This array has size equal to
* the size of the topmost array (call this k), and so if every array were to
* be filled in to capacity, there would be a total of k arrays of size k,
* for a total of k^2 elements. Of this capacity, we know that at least an
* eighth of them are filled in, so k^2 must be at most 8n, and so
* k = O(sqrt(n)). To finish the analysis, the overhead of the structure is
* at most the overhead of this top-level array, plus potentially k - 1
* unused elements in some array. This is a total of O(k) = O(sqrt(n))
* overhead, which is what we originally desired.
*/
import java.util.*; // For AbstractList
@SuppressWarnings("unchecked")
public final class HashedArrayTree extends AbstractList {
/* To simplify the implementation, we enforce that the size of the topmost
* array never drops below 2. This prevents weirdness when we try to
* allocate 2^(n-1) arrays during a doubling and find that n = 0.
*/
private static final int kMinArraySize = 2;
/* The topmost array of elements; initially of size two. */
private T[][] mArrays = (T[][]) new Object[kMinArraySize][];
/* Number of elements, initially zero since the HAT is created empty. */
private int mSize = 0;
/* A constant containing lg2 of the topmost array size. This enables some
* cute bit-twiddling tricks to improve efficiency.
*/
private int mLgSize = 1;
/**
* Returns the number of elements in the HashedArrayTree.
*
* @return The number of elements in the HashedArrayTree.
*/
@Override
public int size() {
return mSize;
}
/**
* Adds a new element to the HashedArrayTree.
*
* @param elem The element to add.
* @return true
*/
@Override
public boolean add(T elem) {
/* First, check if we're completely out of space. If so, do a resize
* to ensure we do indeed have room.
*/
if (size() == mArrays.length * mArrays.length)
grow();
/* Compute the (arr, index) pair for the next position. The next
* position is at the location indicated by size(), but we know that
* space exists from the previous call.
*/
final int offset = computeOffset(size());
final int index = computeIndex(size());
/* Check if an array exists here. If not, make one up. */
if (mArrays[offset] == null)
mArrays[offset] = (T[]) new Object[mArrays.length];
/* Write the element to its location. */
mArrays[offset][index] = elem;
/* Update the element count. */
++mSize;
/* Per the Collections contract, return true to signal a successful
* add.
*/
return true;
}
/**
* Sets the element at the specified position to the indicated value.
* If the index is out of bounds, throws an IndexOutOfBounds exception.
*
* @param index The index at which to set the value.
* @param elem The element to store at that position.
* @return The value initially at that location.
* @throws IndexOutOfBoundsException If index is invalid.
*/
@Override
public T set(int index, T elem) {
/* Find out where to look. */
final int offset = computeOffset(index);
final int arrIndex = computeIndex(index);
/* Cache the value there and write the new one. */
T result = mArrays[offset][arrIndex];
mArrays[offset][arrIndex] = elem;
/* Hand back the old value. */
return result;
}
/**
* Returns the value of the element at the specified position.
*
* @param index The index at which to query.
* @return The value of the element at that position.
* @throws IndexOutOfBoundsException If the index is invalid.
*/
@Override
public T get(int index) {
/* Check that this is a valid index. */
if (index < 0 || index >= size())
throw new IndexOutOfBoundsException("Index " + index + ", size " + size());
/* Look up the element. */
return mArrays[computeOffset(index)][computeIndex(index)];
}
/**
* Adds the specified element at the position just before the specified
* index.
*
* @param index The index just before which to insert.
* @param elem The value to insert
* @throws IndexOutOfBoundsException if the index is invalid.
*/
@Override
public void add(int index, T elem) {
/* Confirm the validity of the index. */
if (index < 0 || index >= size())
throw new IndexOutOfBoundsException("Index " + index + ", size " + size());
/* Add a dummy element to ensure that everything resizes correctly.
* There's no reason to repeat the logic.
*/
add(null);
/* Next, we need to shuffle down every element that appears after
* the inserted element. We'll do this using our own public interface.
*/
for (int i = size(); i > index; ++i)
set(i, get(i - 1));
/* Finally, write the element. */
set(index, elem);
}
/**
* Removes the element at the specified position from the HashedArrayTree.
*
* @param index The index of the element to remove.
* @return The value of the element at that position.
* @throws IndexOutOfBoundsException If the index is invalid.
*/
@Override
public T remove(int index) {
/* Cache the value at the indicated position; this also does the bounds
* check.
*/
T result = get(index);
/* Use a naive shuffle-down algorithm to reposition elements after
* the removed one.
*/
for (int i = index + 1; i < size(); ++i)
set(i - 1, get(i));
/* Clobber the last element to play nicely with the garbage collector. */
set(size() - 1, null);
/* Decrement our size. */
--mSize;
/* If we are now at 1/8 total capacity, shrink the structure. */
if (size() * 8 <= mArrays.length * mArrays.length)
shrink();
/* Otherwise, if the size is now an even multiple of the array size,
* we can drop the very last array. This is the array whose offset
* is one after the end of the elements.
*/
else if (size() % mArrays.length == 0)
mArrays[computeOffset(size())] = null;
return result;
}
/**
* Given an index, returns the offset into the master array at which the
* element with that index can be found.
*
* @return The index into the topmost array where the given element can
* be found.
*/
private int computeOffset(int index) {
/* This can be computed by dividing the index by the index by the
* topmost array. However, if we want to be very clever, we can do
* this efficiently by bit-shifting downard by the lg2 of the size
* of the topmost array.
*/
return index >> mLgSize;
}
/**
* Given an index, returns the offset into the appropriate subarray in
* which the element with that index can be found.
*
* @return The index into the subarray array where the given element can
* be found.
*/
private int computeIndex(int index) {
/* This can be computed by modding the index by the index by the
* topmost array. But we can do this more efficiently with a different
* tactic. Since the array size is a perfect power of two, it must
* look like this:
*
* 00..010..0
*
* Subtracting one yields
*
* 00..001..1
*
* ANDing this with the index produces the value we're looking for.
*/
return index & (mArrays.length - 1);
}
/**
* Grows the internal representation by doubling the size of the topmost
* array and copying the appropriate number of elements over.
*/
private void grow() {
/* Double the size of the topmost array. */
T[][] newArrays = (T[][]) new Object[mArrays.length * 2][];
/* The new arrays each have size 2^(n + 1). We need 2^(n - 1) of them
* to hold the old elements. Allocate those here and copy everything
* over.
*/
for (int i = 0; i < mArrays.length; i += 2) {
/* Allocate the array. */
newArrays[i / 2] = (T[]) new Object[newArrays.length];
/* Use System.arraycopy to move everything over. */
System.arraycopy(mArrays[i], 0, newArrays[i / 2], 0, mArrays.length);
System.arraycopy(mArrays[i + 1], 0, newArrays[i / 2], mArrays.length, mArrays.length);
/* Null out the old arrays to be nice to the GC during this
* potentially stressful time.
*/
mArrays[i] = mArrays[i + 1] = null;
}
/* Switch out this new array for the old. */
mArrays = newArrays;
/* Bump up lg2 of the size. */
++mLgSize;
}
/**
* Decreases the size of the HAT by shrinking into a better fit.
*/
private void shrink() {
/* If the size of the topmost array is at its minimum, don't do
* anything. This doesn't change the asymptotic memory usage because
* we only do this for small arrays.
*/
if (mArrays.length == kMinArraySize) return;
/* Otherwise, we currently have 2^(2n) / 8 = 2^(2n - 3) elements.
* We're about to shrink into a grid of 2^(2n - 2) elements, and so
* we'll fill in half of the elements.
*/
T[][] newArrays = (T[][]) new Object[mArrays.length / 2][];
/* Copy everything over. We'll need half as many arrays as before. */
for (int i = 0; i < newArrays.length / 2; ++i) {
/* Create the arrays. */
newArrays[i] = (T[]) new Object[newArrays.length];
/* Move everything into it. If this is an odd array, it comes
* from the upper half of the old array; otherwise it comes from
* the lower half.
*/
System.arraycopy(mArrays[i / 2], (i % 2 == 0)? 0 : newArrays.length,
newArrays[i], 0, newArrays.length);
/* Play nice with the GC. If this is an odd-numbered array, we
* just copied over everything we needed and can clear out the
* old array.
*/
if (i % 2 == 1)
mArrays[i / 2] = null;
}
/* Copy the arrays over. */
mArrays = newArrays;
/* Drop the lg2 of the size. */
--mLgSize;
}
}
© 2015 - 2025 Weber Informatics LLC | Privacy Policy