cern.colt.matrix.doc-files.function4.html Maven / Gradle / Ivy
Show all versions of parallelcolt Show documentation
Function Objects
Example 4: Sorting by user specified order
Assume, we would like to sort the rows of a 2d matrix by the the last column (representing "age"). This can be done with
// sort by last column
sorted = matrix.viewSorted(matrix.columns()-1);
Or assume, we would like to sort the columns of a 2d matrix by the the last row.
Unfortunately, there is no convenience method to directly sort by row. So we need to view columns as rows and rows as columns, then sort, then adjust our view again.
// sort by last row
int lastRow = matrix.rows()-1;
sorted = matrix.viewDice().viewSorted(lastRow).viewDice();
Next, we would like to sort the rows of a 2d matrix by the aggregate sum
of values in a row. A comparator object is used to do the job:
// sort by sum of values in a row
DoubleMatrix1DComparator comp = new DoubleMatrix1DComparator() {
public int compare(DoubleMatrix1D a, DoubleMatrix1D b) {
double as = a.zSum(); double bs = b.zSum();
return as < bs ? -1 : as == bs ? 0 : 1;
}
};
sorted = cern.colt.matrix.tdouble.algo.Sorting.quickSort(matrix,comp);
Further, we would like to sort the rows of a 2d matrix by the aggregate sum of
logarithms in a row (which is a way to achieve sorting by geometric mean
when viewing a row as a series of samples). A slightly more complex comparator
object is needed:
// sort by sum of logarithms in a row
DoubleMatrix1DComparator comp = new DoubleMatrix1DComparator() {
public int compare(DoubleMatrix1D a, DoubleMatrix1D b) {
double as = a.aggregate(cern.jet.math.Functions.plus,cern.jet.math.Functions.log);
double bs = b.aggregate(cern.jet.math.Functions.plus,cern.jet.math.Functions.log);
return as < bs ? -1 : as == bs ? 0 : 1;
}
};
sorted = cern.colt.matrix.tdouble.algo.Sorting.quickSort(matrix,comp);
This is certainly not most efficient since row sums are recomputed many times
(2*rows*log(rows) times, on average), but will suffice as an example.
An efficient app will precompute the sums and use cern.colt.GenericSorting
to sort the matrix. In general, if comparisons are expensive, precomputation boots
performance by a factor 2*log(rows).
Recently,
two methods that do exactly that were added to cern.colt.matrix.tdouble.algo.DoubleSorting.
One of them works by filling a row into a so-called "bin", which is a multi-set
with statistics operations defined upon. Aggregate measures over the row are
then computed via a DoubleBinFunction1D.
Some prefabricated functions are contained in DoubleBinFunctions1D
Here is how to solve the problem efficiently:
// sort by sum of logarithms in a row
sorted = cern.colt.matrix.tdouble.algo.Sorting.quickSort(matrix,hep.aida.bin.DoubleBinFunctions1D.sumLog);
// sort by median in a row
sorted = cern.colt.matrix.tdouble.algo.Sorting.quickSort(matrix,hep.aida.bin.DoubleBinFunctions1D.median);
// sort by maximum in a row
sorted = cern.colt.matrix.tdouble.algo.Sorting.quickSort(matrix,hep.aida.bin.DoubleBinFunctions1D.max);