com.firefly.utils.collection.HashedArrayTree Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of firefly-common Show documentation
There is a newer version: 5.0.2
package com.firefly.utils.collection;

/***************************************************************************
 * File: HashedArrayTree.java
 * Author: Keith Schwarz ([email protected])
 *
 * An implementation of the List abstraction backed by a hashed array tree
 * (HAT), a data structure supporting amortized O(1) lookup, append, and
 * last-element removal.  In this sense it is akin to a standard dynamic
 * array implementation.  However, a hashed array tree also has the advantage
 * that its memory overhead is only O(sqrt(n)) rather than the typical O(n)
 * found in dynamic arrays and their variants.
 *
 * Internally, the hashed array tree is implemented as an array of pointers
 * that optionally point to an array of elements.  The topmost array and each
 * element always have the same size, which is always a power of two.  In
 * this sense, the hashed array tree is essentially a two-dimensional array
 * of elements.  However, the advantage of the hashed array tree is that the
 * topmost array pointers are all initially null and only filled in when space
 * is needed.  This means that the maximum overhead of the structure is the
 * size of the topmost array, plus the number of unused elements in the current
 * block.  Here is one sample HAT:
 *
 *           [ ] [ ] [ ] [ ]
 *            |   |   |
 *            v   v   v
 *           [0] [4] [8]
 *           [1] [5] [9]
 *           [2] [6] [ ]
 *           [3] [7] [ ]
 *
 * Here, the topmost array of pointers has three pointers in use, each of which
 * point to an array of the corresponding number of elements.
 *
 * Whenever an element is added to a hashed array tree, one of three cases
 * must hold:
 *
 * 1. There is extra space at the end of the final subarray (for example, in
 *    the top picture).  In that case, the element is added to that position.
 * 2. The final subarray is full, but space for another subarray exists in the
 *    topmost array.  In that case, a new array is allocated and the element
 *    is added to that array.
 * 3. The final subarray is full and no new arrays remain open.  In that case,
 *    since the topmost array has size 2^n and each array has size 2^n, there
 *    must be a total of 2^(2n) elements in the hashed array tree.  We
 *    next double the size of the topmost array to 2^(n + 1), then allocate
 *    2^(n - 1) subarrays of size 2^(n + 1) elements for a total of 2^(2n)
 *    elements' worth of space.  The elements from the old HAT are then copied
 *    over, a new array is allocated for the new element, and it is added as
 *    the first element of that array.
 *
 * Let's now talk about the performance and memory usage of this structure.
 * First, we note that we can perform lookups in O(1), assuming that the
 * machine is transdichotomous (meaning that a single machine word can hold
 * the size of any array).  This can be done by breaking the input index in
 * half, then using the first half to choose which array to look into (the
 * "hashed" part of HAT) and the second half to choose which index to select.
 * This trick is similar to the trick used to represent a two-dimensional
 * array using a linear structure.
 *
 * Next, let's think about how much time it takes to do an append operation.
 * Each append when space remains takes O(1) to look up the proper position
 * in the hashed array tree for writing, but appends can be much more expensive
 * when the HAT needs to be resized.  Fortunately, this does not happen very
 * often.  Whenever the HAT doubles in size, its capacity grows from 2^(2n) to
 * 2^(2n + 2), meaning that four times as many elements must be inserted before
 * the next copy operation.  If we define a potential function as twice the 
 * number of filled-in elements in the HAT above half capacity (i.e. the
 * number of elements in the arrays in the second half), then we can prove an
 * amortized O(1) time for append.  Consider any series of appends.  If the
 * append does not expand the HAT, then it takes O(1) time and increases the
 * potential by 1/2.  If the append does expand the HAT, and the HAT's topmost
 * array has size 2^n, then there must be 2^(n-1) elements in the latter half
 * of the HAT, so the potential is 2^(2n).  The time required to move each
 * of the 2^(2n) elements is 2^(2n), and in the new HAT there are no elements
 * in the latter half of the array.  Consequently, the new potential is zero.
 * The actual time required to perform the append is thus 2^2n + O(1), and
 * the decrease in potential is -2^2n, so the amortized cost is O(1) as
 * expected.
 *
 * Last, let's talk about the cost to do a remove.  This is similar to the 
 * append case - we delete the last element of the last array, removing the
 * array from the topmost array if it becomes empty.  We also compact the
 * HAT if it becomes too sparse by shrinking from a HAT of size 2^(2n + 2) to
 * a HAT of size 2^(2n) if the HAT becomes one-eighth full.  A similar
 * potential method can be used to show that this operation runs in amortized
 * O(1).
 *
 * Finally, let's consider the memory overhead of the HAT.  For any HAT of
 * topmost array size 8 or more, since the HAT is always at least one-eighth
 * full, there must be at least one full array.  This array has size equal to
 * the size of the topmost array (call this k), and so if every array were to
 * be filled in to capacity, there would be a total of k arrays of size k,
 * for a total of k^2 elements.  Of this capacity, we know that at least an
 * eighth of them are filled in, so k^2 must be at most 8n, and so 
 * k = O(sqrt(n)).  To finish the analysis, the overhead of the structure is
 * at most the overhead of this top-level array, plus potentially k - 1
 * unused elements in some array.  This is a total of O(k) = O(sqrt(n))
 * overhead, which is what we originally desired.
 */
import java.util.*; // For AbstractList

@SuppressWarnings("unchecked")
public final class HashedArrayTree extends AbstractList {
    /* To simplify the implementation, we enforce that the size of the topmost
     * array never drops below 2.  This prevents weirdness when we try to
     * allocate 2^(n-1) arrays during a doubling and find that n = 0.
     */
    private static final int kMinArraySize = 2;

    /* The topmost array of elements; initially of size two. */
    private T[][] mArrays = (T[][]) new Object[kMinArraySize][];

    /* Number of elements, initially zero since the HAT is created empty. */
    private int mSize = 0;

    /* A constant containing lg2 of the topmost array size.  This enables some
     * cute bit-twiddling tricks to improve efficiency.
     */
    private int mLgSize = 1;

    /**
     * Returns the number of elements in the HashedArrayTree.
     *
     * @return The number of elements in the HashedArrayTree.
     */
    @Override 
    public int size() {
        return mSize;
    }

    /**
     * Adds a new element to the HashedArrayTree.
     *
     * @param elem The element to add.
     * @return true
     */
    @Override 
    public boolean add(T elem) {
        /* First, check if we're completely out of space.  If so, do a resize
         * to ensure we do indeed have room.
         */
        if (size() == mArrays.length * mArrays.length)
            grow();

        /* Compute the (arr, index) pair for the next position.  The next
         * position is at the location indicated by size(), but we know that
         * space exists from the previous call.
         */
        final int offset = computeOffset(size());
        final int index  = computeIndex(size());

        /* Check if an array exists here.  If not, make one up. */
        if (mArrays[offset] == null)
            mArrays[offset] = (T[]) new Object[mArrays.length];

        /* Write the element to its location. */
        mArrays[offset][index] = elem;

        /* Update the element count. */
        ++mSize;

        /* Per the Collections contract, return true to signal a successful
         * add.
         */
        return true;
    }

    /**
     * Sets the element at the specified position to the indicated value.
     * If the index is out of bounds, throws an IndexOutOfBounds exception.
     *
     * @param index The index at which to set the value.
     * @param elem The element to store at that position.
     * @return The value initially at that location.
     * @throws IndexOutOfBoundsException If index is invalid.
     */
    @Override 
    public T set(int index, T elem) {
        /* Find out where to look. */
        final int offset   = computeOffset(index);
        final int arrIndex = computeIndex(index);

        /* Cache the value there and write the new one. */
        T result = mArrays[offset][arrIndex];
        mArrays[offset][arrIndex] = elem;

        /* Hand back the old value. */
        return result;
    }

    /**
     * Returns the value of the element at the specified position.
     *
     * @param index The index at which to query.
     * @return The value of the element at that position.
     * @throws IndexOutOfBoundsException If the index is invalid.
     */
    @Override 
    public T get(int index) {
        /* Check that this is a valid index. */
        if (index < 0 || index >= size())
            throw new IndexOutOfBoundsException("Index " + index + ", size " + size());

        /* Look up the element. */
        return mArrays[computeOffset(index)][computeIndex(index)];
    }

    /**
     * Adds the specified element at the position just before the specified
     * index.
     *
     * @param index The index just before which to insert.
     * @param elem The value to insert
     * @throws IndexOutOfBoundsException if the index is invalid.
     */
    @Override 
    public void add(int index, T elem) {
        /* Confirm the validity of the index. */
        if (index < 0 || index >= size())
            throw new IndexOutOfBoundsException("Index " + index + ", size " + size());
        
        /* Add a dummy element to ensure that everything resizes correctly.
         * There's no reason to repeat the logic.
         */
        add(null);

        /* Next, we need to shuffle down every element that appears after
         * the inserted element.  We'll do this using our own public interface.
         */
        for (int i = size(); i > index; ++i)
            set(i, get(i - 1));

        /* Finally, write the element. */
        set(index, elem);
    }

    /**
     * Removes the element at the specified position from the HashedArrayTree.
     *
     * @param index The index of the element to remove.
     * @return The value of the element at that position.
     * @throws IndexOutOfBoundsException If the index is invalid.
     */
    @Override 
    public T remove(int index) {
        /* Cache the value at the indicated position; this also does the bounds
         * check.
         */
        T result = get(index);

        /* Use a naive shuffle-down algorithm to reposition elements after
         * the removed one.
         */
        for (int i = index + 1; i < size(); ++i)
            set(i - 1, get(i));

        /* Clobber the last element to play nicely with the garbage collector. */
        set(size() - 1, null);

        /* Decrement our size. */
        --mSize;

        /* If we are now at 1/8 total capacity, shrink the structure. */
        if (size() * 8 <= mArrays.length * mArrays.length)
            shrink();
        /* Otherwise, if the size is now an even multiple of the array size,
         * we can drop the very last array.  This is the array whose offset
         * is one after the end of the elements.
         */
        else if (size() % mArrays.length == 0)
            mArrays[computeOffset(size())] = null;

        return result;
    }

    /**
     * Given an index, returns the offset into the master array at which the
     * element with that index can be found.
     *
     * @return The index into the topmost array where the given element can
     *         be found.
     */
    private int computeOffset(int index) {
        /* This can be computed by dividing the index by the index by the
         * topmost array.  However, if we want to be very clever, we can do
         * this efficiently by bit-shifting downard by the lg2 of the size
         * of the topmost array.
         */
        return index >> mLgSize;
    }

    /**
     * Given an index, returns the offset into the appropriate subarray in
     * which the element with that index can be found.
     *
     * @return The index into the subarray array where the given element can
     *         be found.
     */
    private int computeIndex(int index) {
        /* This can be computed by modding the index by the index by the
         * topmost array.  But we can do this more efficiently with a different
         * tactic.  Since the array size is a perfect power of two, it must
         * look like this:
         *
         * 00..010..0
         *
         * Subtracting one yields
         *
         * 00..001..1
         *
         * ANDing this with the index produces the value we're looking for.
         */
        return index & (mArrays.length - 1);
    }

    /**
     * Grows the internal representation by doubling the size of the topmost
     * array and copying the appropriate number of elements over.
     */
    private void grow() {
        /* Double the size of the topmost array. */
        T[][] newArrays = (T[][]) new Object[mArrays.length * 2][];

        /* The new arrays each have size 2^(n + 1).  We need 2^(n - 1) of them
         * to hold the old elements.  Allocate those here and copy everything
         * over.
         */
        for (int i = 0; i < mArrays.length; i += 2) {
            /* Allocate the array. */
            newArrays[i / 2] = (T[]) new Object[newArrays.length];

            /* Use System.arraycopy to move everything over. */
            System.arraycopy(mArrays[i], 0, newArrays[i / 2], 0, mArrays.length);
            System.arraycopy(mArrays[i + 1], 0, newArrays[i / 2], mArrays.length, mArrays.length);

            /* Null out the old arrays to be nice to the GC during this 
             * potentially stressful time.
             */
            mArrays[i] = mArrays[i + 1] = null;
        }

        /* Switch out this new array for the old. */
        mArrays = newArrays;

        /* Bump up lg2 of the size. */
        ++mLgSize;
    }

    /**
     * Decreases the size of the HAT by shrinking into a better fit.
     */
    private void shrink() {
        /* If the size of the topmost array is at its minimum, don't do
         * anything.  This doesn't change the asymptotic memory usage because
         * we only do this for small arrays.
         */
        if (mArrays.length == kMinArraySize) return;

        /* Otherwise, we currently have 2^(2n) / 8 = 2^(2n - 3) elements.
         * We're about to shrink into a grid of 2^(2n - 2) elements, and so
         * we'll fill in half of the elements.
         */
        T[][] newArrays = (T[][]) new Object[mArrays.length / 2][];

        /* Copy everything over.  We'll need half as many arrays as before. */
        for (int i = 0; i < newArrays.length / 2; ++i) {
            /* Create the arrays. */
            newArrays[i] = (T[]) new Object[newArrays.length];

            /* Move everything into it.  If this is an odd array, it comes 
             * from the upper half of the old array; otherwise it comes from
             * the lower half.
             */
            System.arraycopy(mArrays[i / 2], (i % 2 == 0)? 0 : newArrays.length,
                             newArrays[i], 0, newArrays.length);

            /* Play nice with the GC.  If this is an odd-numbered array, we
             * just copied over everything we needed and can clear out the
             * old array.
             */
            if (i % 2 == 1)
                mArrays[i / 2] = null;
        }

        /* Copy the arrays over. */
        mArrays = newArrays;

        /* Drop the lg2 of the size. */
        --mLgSize;
    }
}