All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.commons.numbers.arrays.Selection Maven / Gradle / Ivy

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.commons.numbers.arrays;

/**
 * Select indices in array data.
 *
 * 

Arranges elements such that indices {@code k} correspond to their correctly * sorted value in the equivalent fully sorted array. For all indices {@code k} * and any index {@code i}: * *

{@code
 * data[i < k] <= data[k] <= data[k < i]
 * }
* *

Examples: * *

 * data    [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7]
 *
 *
 * k=4   : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7]
 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7]
 * 
* *

This implementation can select on multiple indices and will handle duplicate and * unordered indices. The method detects ordered indices (with or without duplicates) and * uses this during processing. Passing ordered indices is recommended if the order is already * known; for example using uniform spacing through the array data, or to select the top and * bottom {@code n} values from the data. * *

A quickselect adaptive method is used for single indices. This uses analysis of the * partition sizes after each division to update the algorithm mode. If the partition * containing the target does not sufficiently reduce in size then the algorithm is * progressively changed to use partitions with guaranteed margins. This ensures a set fraction * of data is eliminated each step and worse-case linear run time performance. This method can * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at * the edge of the partition bounded by {@code ka}. * *

Multiple keys are partitioned collectively using an introsort method which only recurses * into partitions containing indices. Excess recursion will trigger use of a heapselect * on the remaining range of indices ensuring non-quadratic worse case performance. Any * partition containing a single index, adjacent pair of indices, or range of indices with a * small separation will use the quickselect adaptive method for single keys. Note that the * maximum number of times that {@code n} indices can be split is {@code n - 1} before all * indices are handled as singles. * *

Floating-point order * *

The {@code <} relation does not impose a total order on all floating-point values. * This class respects the ordering imposed by {@link Double#compare(double, double)}. * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is * considered greater than any other value; and all {@code Double.NaN} values are * considered equal. * *

References * *

Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n} * using repeat division of the data around a partition element, recursing into the * partition that contains {@code k}. * *

Introsort/select is introduced in Musser [2]. This detects excess recursion in * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst * case bound. * *

Use of sampling to identify a pivot that places {@code k} in the smaller partition is * performed in the SELECT algorithm of Floyd and Rivest [3, 4]. * *

A worst-case linear time algorithm PICK is described in Blum et al [5]. This uses * the median of medians as a partition element for selection which ensures a minimum fraction of * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of * Alexandrescu [6]. * *

    *
  1. Hoare (1961) * Algorithm 65: Find * Comm. ACM. 4 (7): 321–322 *
  2. Musser (1999) * Introspective Sorting and Selection Algorithms * * Software: Practice and Experience 27, 983-993. *
  3. Floyd and Rivest (1975) * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements. * Comm. ACM. 18 (3): 173. *
  4. Kiwiel (2005) * On Floyd and Rivest's SELECT algorithm. * Theoretical Computer Science 347, 214-238. *
  5. Blum, Floyd, Pratt, Rivest, and Tarjan (1973) * Time bounds for selection. * * Journal of Computer and System Sciences. 7 (4): 448–461. *
  6. Alexandrescu (2016) * Fast Deterministic Selection * arXiv:1606.00484. *
  7. Quickselect (Wikipedia) *
  8. Introsort (Wikipedia) *
  9. Introselect (Wikipedia) *
  10. Floyd-Rivest algorithm (Wikipedia) *
  11. Median of medians (Wikipedia) *
* * @since 1.2 */ public final class Selection { /** No instances. */ private Selection() {} /** * Partition the array such that index {@code k} corresponds to its correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param k Index. * @throws IndexOutOfBoundsException if index {@code k} is not within the * sub-range {@code [0, a.length)} */ public static void select(double[] a, int k) { IndexSupport.checkIndex(0, a.length, k); doSelect(a, 0, a.length, k); } /** * Partition the array such that indices {@code k} correspond to their correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param k Indices (may be destructively modified). * @throws IndexOutOfBoundsException if any index {@code k} is not within the * sub-range {@code [0, a.length)} */ public static void select(double[] a, int[] k) { IndexSupport.checkIndices(0, a.length, k); doSelect(a, 0, a.length, k); } /** * Partition the array such that index {@code k} corresponds to its correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param fromIndex Index of the first element (inclusive). * @param toIndex Index of the last element (exclusive). * @param k Index. * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the * sub-range {@code [fromIndex, toIndex)} */ public static void select(double[] a, int fromIndex, int toIndex, int k) { IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); IndexSupport.checkIndex(fromIndex, toIndex, k); doSelect(a, fromIndex, toIndex, k); } /** * Partition the array such that indices {@code k} correspond to their correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param fromIndex Index of the first element (inclusive). * @param toIndex Index of the last element (exclusive). * @param k Indices (may be destructively modified). * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the * sub-range {@code [fromIndex, toIndex)} */ public static void select(double[] a, int fromIndex, int toIndex, int[] k) { IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); IndexSupport.checkIndices(fromIndex, toIndex, k); doSelect(a, fromIndex, toIndex, k); } /** * Partition the array such that index {@code k} corresponds to its correctly * sorted value in the equivalent fully sorted array. * *

This method pre/post-processes the data and indices to respect the ordering * imposed by {@link Double#compare(double, double)}. * * @param fromIndex Index of the first element (inclusive). * @param toIndex Index of the last element (exclusive). * @param a Values. * @param k Index. */ private static void doSelect(double[] a, int fromIndex, int toIndex, int k) { if (toIndex - fromIndex <= 1) { return; } // Sort NaN / count signed zeros. // Caution: This loop contributes significantly to the runtime. int cn = 0; int end = toIndex; for (int i = toIndex; --i >= fromIndex;) { final double v = a[i]; // Count negative zeros using a sign bit check if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { cn++; // Change to positive zero. // Data must be repaired after selection. a[i] = 0.0; } else if (v != v) { // Move NaN to end a[i] = a[--end]; a[end] = v; } } // Partition if (end - fromIndex > 1 && k < end) { QuickSelect.select(a, fromIndex, end - 1, k); } // Restore signed zeros if (cn != 0) { // Use partition index below zero to fast-forward to zero as much as possible for (int j = a[k] < 0 ? k : -1;;) { if (a[++j] == 0) { a[j] = -0.0; if (--cn == 0) { break; } } } } } /** * Partition the array such that indices {@code k} correspond to their correctly * sorted value in the equivalent fully sorted array. * *

This method pre/post-processes the data and indices to respect the ordering * imposed by {@link Double#compare(double, double)}. * * @param fromIndex Index of the first element (inclusive). * @param toIndex Index of the last element (exclusive). * @param a Values. * @param k Indices (may be destructively modified). */ private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) { if (k.length == 0 || toIndex - fromIndex <= 1) { return; } // Sort NaN / count signed zeros. // Caution: This loop contributes significantly to the runtime for single indices. int cn = 0; int end = toIndex; for (int i = toIndex; --i >= fromIndex;) { final double v = a[i]; // Count negative zeros using a sign bit check if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) { cn++; // Change to positive zero. // Data must be repaired after selection. a[i] = 0.0; } else if (v != v) { // Move NaN to end a[i] = a[--end]; a[end] = v; } } // Partition int n = 0; if (end - fromIndex > 1) { n = k.length; // Filter indices invalidated by NaN check if (end < toIndex) { for (int i = n; --i >= 0;) { final int index = k[i]; if (index >= end) { // Move to end k[i] = k[--n]; k[n] = index; } } } // Return n, the count of used indices in k. // Use this to post-process zeros. n = QuickSelect.select(a, fromIndex, end - 1, k, n); } // Restore signed zeros if (cn != 0) { // Use partition indices below zero to fast-forward to zero as much as possible int j = -1; if (n < 0) { // Binary search on -n sorted indices: hi = (-n) - 1 int lo = 0; int hi = ~n; while (lo <= hi) { final int mid = (lo + hi) >>> 1; if (a[k[mid]] < 0) { j = mid; lo = mid + 1; } else { hi = mid - 1; } } } else { // Unsorted, process all indices for (int i = n; --i >= 0;) { if (a[k[i]] < 0) { j = k[i]; } } } for (;;) { if (a[++j] == 0) { a[j] = -0.0; if (--cn == 0) { break; } } } } } /** * Partition the array such that index {@code k} corresponds to its correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param k Index. * @throws IndexOutOfBoundsException if index {@code k} is not within the * sub-range {@code [0, a.length)} */ public static void select(int[] a, int k) { IndexSupport.checkIndex(0, a.length, k); if (a.length <= 1) { return; } QuickSelect.select(a, 0, a.length - 1, k); } /** * Partition the array such that indices {@code k} correspond to their correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param k Indices (may be destructively modified). * @throws IndexOutOfBoundsException if any index {@code k} is not within the * sub-range {@code [0, a.length)} */ public static void select(int[] a, int[] k) { IndexSupport.checkIndices(0, a.length, k); if (k.length == 0 || a.length <= 1) { return; } QuickSelect.select(a, 0, a.length - 1, k, k.length); } /** * Partition the array such that index {@code k} corresponds to its correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param fromIndex Index of the first element (inclusive). * @param toIndex Index of the last element (exclusive). * @param k Index. * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the * sub-range {@code [fromIndex, toIndex)} */ public static void select(int[] a, int fromIndex, int toIndex, int k) { IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); IndexSupport.checkIndex(fromIndex, toIndex, k); if (toIndex - fromIndex <= 1) { return; } QuickSelect.select(a, fromIndex, toIndex - 1, k); } /** * Partition the array such that indices {@code k} correspond to their correctly * sorted value in the equivalent fully sorted array. * * @param a Values. * @param fromIndex Index of the first element (inclusive). * @param toIndex Index of the last element (exclusive). * @param k Indices (may be destructively modified). * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the * sub-range {@code [fromIndex, toIndex)} */ public static void select(int[] a, int fromIndex, int toIndex, int[] k) { IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length); IndexSupport.checkIndices(fromIndex, toIndex, k); if (k.length == 0 || toIndex - fromIndex <= 1) { return; } QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length); } }





© 2015 - 2024 Weber Informatics LLC | Privacy Policy