com.aliasi.matrix.SvdMatrix Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of aliasi-lingpipe Show documentation
This is the original Lingpipe: http://alias-i.com/lingpipe/web/download.html There were not made any changes to the source code.
There is a newer version: 4.1.2-JL1.0
Show newest version
/*
 * LingPipe v. 4.1.0
 * Copyright (C) 2003-2011 Alias-i
 *
 * This program is licensed under the Alias-i Royalty Free License
 * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i
 * Royalty Free License Version 1 for more details.
 *
 * You should have received a copy of the Alias-i Royalty Free License
 * Version 1 along with this program; if not, visit
 * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
 * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
 * +1 (718) 290-9170.
 */

package com.aliasi.matrix;

import com.aliasi.io.LogLevel;
import com.aliasi.io.Reporter;
import com.aliasi.io.Reporters;

import java.io.PrintWriter;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import java.util.Random;

/**
 * An SvdMatrix provides a means of storing a matrix that
 * has been factored via a singular-value decomposition (SVD).  This
 * class also provides a static method for computing regularized
 * singular-value decompositions of partial matrices.
 *
 * Singular Value Decomposition
 *
 * Singular value decomposition (SVD) factors an
 * m×n matrix A into a product of
 * three matrices:
 *
 * 
 * A = U * S * V^{^T}
 *
 * where U is an m×k matrix,
 * V is an n×k matrix, and
 * S is a k×k matrix, where
 * k is the rank of the matrix A.  The
 * multiplication (*) is matrix multiplication and
 * the superscripted T indicates matrix transposition.
 *
 * The m-dimensional vectors making up the columns of
 * U are called left singular vectors, whereas
 * the n-dimesnional vectors making up the rows of
 * V are called right singular vectors.  The values on
 * the diagonal of S are called the singular values.
 * The combination of the q-th left singular vector,
 * right singular vector, and singular value is known as a factor.
 *
 * 
The singular value matrix S is a diagonal matrix
 * with positive, strictly non-increasing values, so that
 * S[i][i] >= S[i+1][i+1], for i < k.
 * The set of left and set of right singular vectors are orthonormal.
 * Normality means that each singular vector is of unit length (length
 * 1).  Orthogonality means that any pair of left singular
 * vectors is orthogonal and any pair of right singular vectors are
 * orthogonal (meaning their dot product is 0).
 *
 * 
Matrices have unique singular-value decompositions
 * up to the re-ordering of columns with equal singular values and
 * up to cancelling sign changes in the singular vectors.
 *
 * 
Construction and Value Computation
 *
 * An SvdMatrix may be constructed out of the singular
 * vectors and singular values, or out of the vectors with singular
 * values scaled in.
 *
 * Given that S is diagonal, the value of a particular
 * entry in A works out to:
 *
 *  * A[i][j] = Σ_k U[i][k] * S[k][k] * V[j][k]
 *
 * To save time in the application and space in the class,
 * we factor S into U and V
 * to produce a new pair of matrices U' and V'
 * defined by:
 *
 *  * U' = U * sqrt(S)
 * V'^T = sqrt(S) * V^T
 *
 * with the square-root performed component-wise:
 *
 *  * sqrt(S)[i][j] = sqrt(S[i][j])
 *
 * By the associativity of matrix multiplication, we have:
 *
 *  * U * S * V^T
 * = U * sqrt(S) * sqrt(S) * V
 * = U' * V'^T
 *
 * Thus the class implementation is able to store U'
 * and V', thus reducing the amount computation involved
 * in returning a value (using column vectors as the default vector
 * orientation):
 *
 * 
 * A[i][j] = U'[i]^T * V'[j]
 *
 *
 * Square Error and the Frobenius Norm
 *
 * Suppose A and B are
 * m×n matrices.  The square error between them is
 * defined by the so-called Frobenius norm, which extends the standard
 * vector L₂ or Euclidean norm to matrices:
 *
 *  * squareError(A,B)

 * = frobeniusNorm(A-B)

 * = Σ_{i < m} Σ_{j < n} (A[i][j] - B[i][j])²
 *
 * The square error is sometimes referred to as the residual sum
 * of squares (RSS), because A[i][j] - B[i][j]
 * is the residual (difference between actual and predicted value).
 *
 * Lower-order Approximations
 *
 * Consider factoring a matrix A of dimension
 * m×n into the product of two matrices
 * X * Y^T, where X is of dimension
 * m×k and Y is of dimension
 * n×k.  We may then measure the square
 * error squareError(A,X * Y) to determine how well
 * the factorization matches A.  We know that if
 * we take U', V' from the singular value
 * decomposition that:
 *
 *  * squareError(A, U' * V'^T) = 0.0
 *
 * Here U' and V' have a number of columns
 * (called factors) equal to the rank of the matrix A.
 * The singular value decomposition is such that the first
 * k columns of U' and V'
 * provide the best order q approximation of
 * A of any X and Y
 * of dimensions m×q and n×q
 * In symbols:
 *
 *  * U'_q, V'_q = argmin _{X is m×q, Y is n×q} squareError(A, X * Y^T)
 *
 * where U'_q is the restriction of U'
 * to its first q columns.
 *
 * Often errors are reported as means, where the mean square error
 * (MSE) is defined by:
 *
 * 
 * meanSquareError(A,B) = squareError(A,B)/(m×n)
 *
 * To adjust to a linear scale, the square root of mean square
 * error (RMSE) is often used:
 *
 * 
 * rootMeanSquareError(A,B) = sqrt(meanSquareError(A,B))
 *
 *
 * Partial Matrices
 *
 * A partial matrix is one in which some of the values are unknown.
 * This is in contrast with a sparse matrix, in which most of the
 * values are zero.  A variant of singular value decomposition may be
 * used to impute the unknown values by minimizing square error with
 * respect to the known values only.  Unknown values are then simply
 * derived from the factorization U' * V'^T.
 * Typically, the approximation is of lower order than the rank of
 * the matrix.
 *
 * Regularized SVD via Shrinkage
 *
 * Linear regression techniques such as SVD often overfit their
 * training data in order to derive exact answers.  This problem
 * is mitigated somewhat by choosing low-order approximations to
 * the full-rank SVD.  Another option is to penalize large values
 * in the singular vectors, thus favoring smaller values.  The
 * most common way to do this because of its practicality is via
 * parameter shrinkage.
 *
 * Shrinkage is a general technique in least squares fitting that
 * adds a penalty term proportional to the square of the size of
 * the parameters.  Thus the square error objective function is
 * replaced with a regularized version:
 *
 * 
 * regularizedError(A, U' * V')

 * = squareError(A, U' * V')

 * + parameterCost(U') + parameterCost(V')
 *
 * where the parameter costs for a vector X of
 * dimensionality q is the sum of the squared
 * parameters:
 *
 *  * parameterCost(X)

 * = λ * Σ_{i < q} X[i]²

 * = λ * length(X)²
 *
 * Note that the hyperparameter λ scales the
 * parameter cost term in the overall error.
 *
 * Setting the regularization parameter to zero sets the parameter
 * costs to zero, resulting in an unregularized SVD computation.

 * 
In Bayesian terms, regularization is equivalent to a normal
 * prior on parameter values centered on zero with variance controlled
 * by λ.  The resulting minimizer of regularized
 * error is the maximum a posteriori (MAP) solution.  The Bayesian
 * approach also allows covariances to be estimated (including simple
 * parameter variance estimates), but these are not implemented in
 * this class.
 *
 *
 * 
Regularized Stochastic Gradient Descent
 *
 * Singular value decomposition may be computed "exactly"
 * (modulo arithmetic precision and convergence) using an algorithm
 * whose time complexity is O(m³ +
 * n³) for an m×n matrix (equal
 * to O(max(m,n)³).  Arithmetic precision is
 * especially vexing at singular values near zero and with highly
 * correlated rows or columns in the input matrix.
 *
 * For large partial matrices, we use a form of stochastic gradient
 * descent which computes a single row and column singular vector (a
 * single factor, that is) at a time.  Each factor is estimated by
 * iteratively visiting the data points and adjusting the unnormalized
 * singular vectors making up the current factor.  Each adjustment is
 * a least-squares fitting step, where we simultaneously adjust the
 * left singular vectors given the right singular vectors and
 * vice-versa.
 *
 * 
The least-squares adjustments take the following form.  For each
 * value, we compute the current error (using the order k
 * approximation and the current order k+1 values) and
 * then move the vectors to reduce error.  We use U' and
 * V' for the incremental values of the singular vectors
 * scaled by the singular values:
 *
 * 
 * for (k = 0; k < maxOrder; ++k)
 *     for (i < m) U'[i][k] = random.nextGaussian()*initValue;
 *     for (j < n) V'[j][k] = random.nextGaussian()*initValue;
 *     for (epoch = 0; epoch < maxEpochs && not converged; ++epoch)
 *         for (i,j such that M[i][j] defined)
 *             error = M[i][j] - U'_k[i] * V'_k[j] // * is vector dot product
 *             uTemp = U'[i][k]
 *             vTemp = V'[j][k]
 *             U'[i][k] += learningRate[epoch] * ( error * vTemp - regularization * uTemp )
 *             V'[j][k] += learningRate[epoch] * ( error * uTemp - regularization * vTemp )
 *
 * where initValue, maxEpochs,
 * learningRate (see below), and
 * regularization are algorithm hyperparameters.  Note
 * that the initial values of the singular vectors are set randomly to
 * the result of a draw from a Gaussian (normal) distribution with
 * mean 0.0 and variance 1.0.
 *
 * Because we use the entire set of
 * factors in the error computation, the current factor is guaranteed
 * to have singular vectors orthogonal to the singular vectors already
 * computed.
 *
 * 
Note that in the actual implementation, the contribution to the
 * error of the first k-1 factors is cached to reduce
 * time in the inner loop.  This requires a double-length floating
 * point value for each defined entry in the matrix.
 *
 * 
Gradient Interpretation
 *
 * Like most linear learners, this algorithm merely moves the
 * parameter vectors U'[i] and U'[j] in the
 * direction of the gradient of the error.  The gradient is the
 * multivariate derivative of the objective function being minimized.
 * Because our object is squared error, the gradient is just its
 * derivative, which is just (two times) the (linear) error itself.  We
 * roll the factor of 2 into the learning rate to derive the update in
 * the algorithm pseudo-code above.
 *
 * 
The term (error * vTemp) is the component of the
 * error gradient due to the current value of V'[i][k]
 * and the term (regularization * uTemp) is the component of the
 * gradient to the size of the parameter U'[i][k].  The
 * updates thus move the parameter vectors in the direction of
 * the gradient.
 *
 * 
Convergence Conditions
 *
 * The convergence conditions for a given factor require either
 * hitting the maximum number of allowable epochs, or finding the
 * improvement from one epoch to the next is below some relative
 * threshold:
 *
 * 
 * regError^(epoch) = regError(M,U'_k^(epoch) * V'_k^(epoch)^T)

 * relativeImprovement^(epoch+1) = relativeDiff(regError^(epoch+1), regError^(epoch))

 * relativeDiff(x,y) = abs(x-y)/(abs(x) + abs(y))
 *
 * When the relative difference in square error is less than
 * a hyperparameter threshold minImprovement, the
 * epoch terminates and the algorithm moves on to the next
 * factor k+1.
 *
 * Note that a complete matrix is a degenerate kind of partial
 * matrix.  The gradient descent computation still works in this
 * situation, but is not as efficient or as accurate as an
 * algebraic SVD solver for small matrices.
 *
 * 
Annealing Schedule
 *
 * Learning rates that are too high are unstable, whereas learning rates
 * that are too low never reach their targets.  To get around this
 * problem, the learning rate, learningRate[epoch], is
 * lowered as the number of epochs increase.  This lowering of the
 * learning rate has a thermodynamic interpretation in terms of free
 * energy, hence the name "annealing".  Larger moves are
 * made in earlier epochs, then the temperature is gradually lowered
 * so that the learner may settle into a stable fixed point.  The
 * function learningRate[epoch] is called the annealing
 * schedule.
 *
 * There are theoretical requirements on the annealing schedule
 * that guarantee convergence (up to arithmetic precision and no upper
 * bound on the number of epochs):
 *
 * 
 * Σ_epoch learningRate[epoch] = infinity

 * Σ_epoch learningRate[epoch]² < infinity
 *
 * The schedule we use is the one commonly chosen to meet the
 * above requirements:
 *
 *  * learningRate[epoch] = initialLearningRate / (1 + epoch/annealingRate)
 *
 * where initialLearningRate is an initial learning rate and
 * annealingFactor determines how quickly it shrinks.
 * The learning rate moves from its initial size
 * (initialLearningRate) to one half (1/2) of its
 * original size after annealingRate epochs, and
 * moves from its initial size to one tenth (1/10) of
 * its initial size after 9 * annealingRate epochs,
 * and one hundredth of its initial size after
 * 99 * annealingRate epochs..
 *
 * Parameter Choices
 *
 * The previous discussion has introduced a daunting list of
 * parameters required for gradient descent for singular value
 * decomposition.  Unfortunately, the stochastic gradient descent
 * solver requires a fair amount of tuning to recover a low mean
 * square error factorization.  The optimal settings will also depend
 * on the input matrix; for example, very sparse partial matrices are
 * much easier to fit than dense matrices.
 *
 * 
Maximum Order
 *
 * Determining the order of the decomposition is mostly a matter
 * of determining how many orders are needed for the amount of
 * smoothing required.  Low order reconstructions are useful for
 * most applications.  One way to determine maximum order is using
 * held out data for an application.  Another is to look for
 * a point where the singular values become (relatively) insignificant.
 *
 *
 * 
Maximum Epochs and Early Stopping
 *
 * For low mean square error factorizations, many epochs may be
 * necessary.  Lowering the maximum number of epochs leads to what is
 * known as early stopping.  Sometimes, early stopping leads to
 * more accurate held-out predictions, but it will always raise
 * the factorization error (training data predictions).  In general,
 * the maximum number of epochs needs to be set empirically.
 * Initially, try fewer epochs and gradually raise the number of
 * epochs until the desired accuracy is achieved.
 *
 * 
Minimum Improvement
 *
 * By setting the minimum improvement to 0.0, the algorithm is forced
 * to use the maximum number of epochs.  By setting it above this
 * level, a form of early stopping is achieved when the benefit of
 * another epoch of refitting falls below a given improvement
 * threshold.  This value may also be set on an application basis
 * using held out data, or it may be fit to achieve a given level of
 * mean square error on the training (input) data.  The minimum
 * improvement needs to be set fairly low (less than 0.000001) to
 * achieve reasonably precise factorizations.  Note that minimum
 * improvement is defined relatively, as noted above.
 *
 * Initial Parameter Values
 *
 * Initial values of the singular vectors are not particularly
 * sensitive, because we are using multiplicative updates.  A good
 * rule of thumb for starting values is the the inverse square root of
 * the maximum order:
 *
 * 
 * initVal ~ 1 / sqrt(maxOrder)
 *
 * Learning Rate, Maximum Epochs and Annealing
 *
 * A good starting point for the learning rate is 0.01.  The
 * annealing parameter should be set so that the learning rate is cut
 * by at least a factor of 10 in the final rounds.  This calls for a
 * value that's roughly one tenth of the maximum number of epochs.  If
 * the initial learning rate is set higher, then the annealing
 * schedule should be more agressive so that it spends a fair amount
 * of time in the 0.001 to 0.0001 learning rate range.
 *
 *
 * 
References
 *
 * There are thorough Wikipedia articles on singular value decomposition
 * and gradient descent, although the SVD entry focuses entirely on
 * complete (non-partial) matrices:
 *
 * 

 * Wikipedia: Singular Value Decomposition
 * Wikipedia: Gradient Descent
 * 
 *
 * Both of the standard machine learning textbooks have good
 * theoretical explanations of least squares, regularization,
 * gradient descent, and singular value decomposition, but not
 * all together:
 *
 * 
 * Chris Bishop.  2007.  Pattern Recognition and Machine Learning.  Springer.
 * Trevor Hastie, Robert Tibshirani, and Jerome Friedman.  2001.
 * The Elements of Statistical Learning.  Springer.
 * 
 *
 * The following is a particularly clear explanation of many of
 * the issues involved in the context of neural networks:
 *
 * 

 * Genevieve Orr, Nici Schraudolph, and Fred Cummins.  1999.
CS-449: Neural Networks.  Willamette University course notes.
 * 

 * Our partial SVD solver is based on C code from Timely
 * Development (see license below).  Timely based their code on Simon
 * Funk's blog entry.  Simon Funk's approach was itself based on his
 * and Genevieve Gorrell's 2005 EACL paper.
 *
 * 

 * Timely Development's Netflix Prize Page.
 * 
Simon Funk (pseudonym for Brandyn Webb). 2007.
 * Gradient Descent SVD Algorithm.  The Evolution of Cybernetics blog.
 * 
Genevieve Gorrell.  2006.
 * Generalized Hebbian Algorithm for Incremental Singular Value
Decomposition in Natural Language Processing.  EACL 2006.
* Genevieve Gorrell.  2006.
* Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing. Ph.D. Thesis.
* Linköping University.  Sweden.
 * 
 *
 * Acknowledgements
 *
 * The singular value decomposition code is rather loosely based on
 * a C program developed by Timely Development.
 * Here is the copyright notice for the original code:
 *
 * 
 * // SVD Sample Code
 * //
 * // Copyright (C) 2007 Timely Development (www.timelydevelopment.com)
 * //
 * // Special thanks to Simon Funk and others from the Netflix Prize contest
 * // for providing pseudo-code and tuning hints.
 * //
 * // Feel free to use this code as you wish as long as you include
 * // these notices and attribution.
 * //
 * // Also, if you have alternative types of algorithms for accomplishing
 * // the same goal and would like to contribute, please share them as well :)
 * //
 * // STANDARD DISCLAIMER:
 * //
 * // - THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY
 * // - OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
 * // - LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR
 * // - FITNESS FOR A PARTICULAR PURPOSE.
 * 
 *
 * @author Bob Carpenter
 * @version 4.0.0
 * @since   LingPipe3.2
 */
public class SvdMatrix extends AbstractMatrix {

    private final double[][] mRowVectors;
    private final double[][] mColumnVectors;
    private final int mOrder;

    /**
     * Construct a factored matrix using the specified row and column
     * vectors at the specified order.  Each vector in the arrays of
     * row and column vectors must be of the same dimension as the
     * order.
     *
     * See the class documentation above for more information
     * on singular value decomposition.
     *
     * @param rowVectors Row vectors, indexed by row.
     * @param columnVectors Column vectors, indexed by column.
     * @param order Dimensionality of the row and column vectors.
     */
    public SvdMatrix(double[][] rowVectors,
                     double[][] columnVectors,
                     int order) {
        verifyDimensions("row",order,rowVectors);
        verifyDimensions("column",order,columnVectors);
        mRowVectors = rowVectors;
        mColumnVectors = columnVectors;
        mOrder = order;
    }

    /**
     * Construct an SVD matrix using the specified singular vectors
     * and singular values.  The order of the factorization is equal to
     * the length of the singular values.  Each singular vector must
     * be the same dimensionality as the array of singular values.
     *
     * 
See the class documentation above for more information
     * on singular value decomposition.
     *
     * @param rowSingularVectors Row singular vectors, indexed by row.
     * @param columnSingularVectors Column singular vectors, indexed by column.
     * @param singularValues Array of singular values, in decreasing order.
     */
    public SvdMatrix(double[][] rowSingularVectors,
                     double[][] columnSingularVectors,
                     double[] singularValues) {
        mOrder = singularValues.length;
        verifyDimensions("row",mOrder,rowSingularVectors);
        verifyDimensions("column",mOrder,columnSingularVectors);

        mRowVectors = new double[rowSingularVectors.length][mOrder];
        mColumnVectors = new double[columnSingularVectors.length][mOrder];
        double[] sqrtSingularValues = new double[singularValues.length];
        for (int i = 0; i < sqrtSingularValues.length; ++i)
            sqrtSingularValues[i] = Math.sqrt(singularValues[i]);
        scale(mRowVectors,rowSingularVectors,sqrtSingularValues);
        scale(mColumnVectors,columnSingularVectors,sqrtSingularValues);
    }

    /**
     * Returns the number of rows in this matrix.
     *
     * @return The number of rows in this matrix.
     */
    @Override
    public int numRows() {
        return mRowVectors.length;
    }

    /**
     * Returns the number of columns in this matrix.
     *
     * @return The number of columns in this matrix.
     */
    @Override
    public int numColumns() {
        return mColumnVectors.length;
    }

    /**
     * Returns the order of this factorization.  The order
     * is the number of dimensions in the singular vectors
     * and the singular values.
     *
     * @return The order of this decomposition.
     */
    public int order() {
        return mRowVectors[0].length;
    }

    /**
     * Returns the value of this matrix at the specified row and column.
     *
     * @param row Row index.
     * @param column Column index.
     * @return The value of this matrix at the specified row and
     * column.
     */
    @Override
    public double value(int row, int column) {
        double[] rowVec = mRowVectors[row];
        double[] colVec = mColumnVectors[column];
        double result = 0.0;
        for (int i = 0; i < rowVec.length; ++i)
            result += rowVec[i] * colVec[i];
        return result;
    }

    /**
     * Returns the array of singular values for this decomposition.
     *
     * @return The singular values for this decomposition.
     */
    public double[] singularValues() {
        double[] singularValues = new double[mOrder];
        for (int i = 0; i < singularValues.length; ++i)
            singularValues[i] = singularValue(i);
        return singularValues;
    }

    /**
     * Returns the singular value for the specified order.
     *
     * @param order Dimension of the singular value.
     * @return The singular value at the specified order.
     */
    public double singularValue(int order) {
        if (order >= mOrder) {
            String msg = "Maximum order=" + (mOrder-1)
                + " found order=" + order;
            throw new IllegalArgumentException(msg);
        }
        return columnLength(mRowVectors,order)
            * columnLength(mColumnVectors,order);
    }

    /**
     * Returns a matrix in which the left singular vectors make up the
     * columns.  The first index is for the row of the original matrix
     * and the second index is for the order of the singular vector.
     * Thus the returned matrix is m×k, if the
     * original input was an m×n matrix and SVD was
     * performed at order k.

     * @return The left singular vectors of this matrix.
     */
    public double[][] leftSingularVectors() {
        return normalizeColumns(mRowVectors);
    }


    /**
     * Returns a matrix in which the right singular vectors make up
     * the columns.  The first index is for the column of the original
     * matrix and the second index is for the order of the singular
     * vector.  Thus the returned matrix is n×k, if
     * the original input was an m×n matrix and SVD
     * was performed at order k.
     *
     * @return The right singular vectors.
     */
    public double[][] rightSingularVectors() {
        return normalizeColumns(mColumnVectors);
    }


    /**
     * Returns the signular value decomposition of the specified
     * complete matrix of values.
     *
     * 
For a full description of the arguments and values, see
     * the method documentation for
     * {@link #partialSvd(int[][],double[][],int,double,double,double,double,Reporter,double,int,int)}
     * and the class documentation above.
     *
     * 
The two-dimensional array of values must be an
     * m × n matrix.  In particular, each row
     * must be of the same length.  If this is not the case, an illegal
     * argument exception will be raised.
     *
     * 
This is now a utility method that calls {@link
     * #svd(double[][],int,double,double,double,double,Reporter,double,int,int)}
     * with a reporter wrapping the specified print writer at the
     * debug level, or a silent print writer.
     *
     * @param values Array of values.
     * @param maxOrder Maximum order of the decomposition.
     * @param featureInit Initial value for singular vectors.
     * @param initialLearningRate Incremental multiplier of error
     * determining how fast learning occurs.
     * @param annealingRate Rate at which annealing occurs; higher values
     * provide more gradual annealing.
     * @param regularization A regularization constant to damp learning.
     * @param reporter Reporter to which to send progress and error
     * reports.
     * @param minImprovement Minimum relative improvement in mean
     * square error required to finish an epoch.
     * @param minEpochs Minimum number of epochs for training.
     * @param maxEpochs Maximum number of epochs for training.
     * training, or {@code null} if no output is desired.
     * @return Singular value decomposition for the specified partial matrix
     * at the specified order.
     * @throws IllegalArgumentException Under conditions listed in the
     * method documentation above.
     */
    public static SvdMatrix svd(double[][] values,
                                int maxOrder,

                                double featureInit,
                                double initialLearningRate,
                                double annealingRate,
                                double regularization,

                                Reporter reporter,

                                double minImprovement,
                                int minEpochs,
                                int maxEpochs) {

        if (reporter == null)
            reporter = Reporters.silent();

        int m = values.length;
        int n = values[0].length;
        reporter.info("Calculating SVD");
        reporter.info("#Rows=" + m + " #Cols=" + n);

        // check all rows same length
        for (int i = 1; i < m; ++i) {
            if (values[i].length != n) {
                String msg = "All rows must be of same length."
                    + " Found row[0].length=" + n
                    + " row[" + i + "]=" + values[i].length;
                reporter.fatal(msg);
                throw new IllegalArgumentException(msg);
            }
        }

        // shared column Ids rows
        int[] sharedRow = new int[n];
        for (int j = 0; j < n; ++j)
            sharedRow[j] = j;
        int[][] columnIds = new int[m][];
        for (int j = 0; j < m; ++j)
            columnIds[j] = sharedRow;

        return partialSvd(columnIds,
                          values,
                          maxOrder,
                          featureInit,
                          initialLearningRate,
                          annealingRate,
                          regularization,
                          reporter,
                          minImprovement,
                          minEpochs,
                          maxEpochs);
    }




    /**
     * Return the singular value decomposition of the specified
     * partial matrix, using the specified search parameters.
     *
     * 
The writer parameter may be set to allow incremental progress
     * reports to that writer during training.  These report on RMSE
     * per epoch.
     *
     * 
See the class documentation above for a description of the
     * algorithm.
     *
     * 
There are a number of constraints on the input, any
     * violation of which will raise an illegal argument exception.
     * The conditions are:
     * 

     *
     *
     * The maximum order must be greater than zero.
     *
     * The minimum relative improvement in mean square error must be non-negative
     * and finite.
     *
     * The minimum number of epochs must be greater than zero
     * and less than or equal to the maximum number of epochs.
     *
     * The feature initialization value must be non-zero and finite.
     *
     * The learning rate must be positive and finite.
     *
     * The regularization parameter must be non-negative and finite.
     *
     * The column identitifer and value arrays must be the same
     * length.
     *
     * The elements of the column identifier array and the
     * value array must all be of the same length.
     *
     * All column identifiers must be non-negative.
     *
     * Each row of the column identifier matrix must contain
     * columns in strictly ascending order.
     *
     * 
     *
     * @param columnIds Identifiers of column index for given row and entry.
     * @param values Values at row and column index for given entry.
     * @param maxOrder Maximum order of the decomposition.
     * @param featureInit Initial value for singular vectors.
     * @param initialLearningRate Incremental multiplier of error determining how
     * fast learning occurs.
     * @param annealingRate Rate at which annealing occurs; higher values
     * provide more gradual annealing.
     * @param regularization A regularization constant to damp learning.
     * @param reporter Reporter to which progress reports are written, or
     * {@code null} if no reporting is required.
     * @param minImprovement Minimum relative improvement in mean square error required
     * to finish an epoch.
     * @param minEpochs Minimum number of epochs for training.
     * @param maxEpochs Maximum number of epochs for training.
     * @return Singular value decomposition for the specified partial matrix
     * at the specified order.
     * @throws IllegalArgumentException Under conditions listed in the
     * method documentation above.
     */
    public static SvdMatrix partialSvd(int[][] columnIds,
                                       double[][] values,

                                       int maxOrder,

                                       double featureInit,
                                       double initialLearningRate,
                                       double annealingRate,
                                       double regularization,

                                       Reporter reporter,

                                       double minImprovement,
                                       int minEpochs,
                                       int maxEpochs) {
        return partialSvd(columnIds,values,
                          maxOrder,
                          featureInit,initialLearningRate,annealingRate,regularization,
                          new Random(),
                          reporter,
                          minImprovement,minEpochs,maxEpochs);
    }



    /**
     * This method is identical to the other singular-value decomposition
     * method, only with a specified randomizer.
     *
     * @param columnIds Identifiers of column index for given row and entry.
     * @param values Values at row and column index for given entry.
     * @param maxOrder Maximum order of the decomposition.
     * @param featureInit Initial value for singular vectors.
     * @param initialLearningRate Incremental multiplier of error determining how
     * fast learning occurs.
     * @param annealingRate Rate at which annealing occurs; higher values
     * provide more gradual annealing.
     * @param random Randomizer to use for initialization.
     * @param reporter Reporter to which progress reports are written, or {@code null}
     * for no reporting.
     * @param regularization A regularization constant to damp learning.
     * @param minImprovement Minimum relative improvement in mean square error required
     * to finish an epoch.
     * @param minEpochs Minimum number of epochs for training.
     * @param maxEpochs Maximum number of epochs for training.
     * @return Singular value decomposition for the specified partial matrix
     * at the specified order.
     * @throws IllegalArgumentException Under conditions listed in the
     * method documentation above.
     */
        static SvdMatrix partialSvd(int[][] columnIds,
                                       double[][] values,

                                       int maxOrder,

                                       double featureInit,
                                       double initialLearningRate,
                                       double annealingRate,
                                       double regularization,

                                       Random random,

                                       Reporter reporter,

                                       double minImprovement,
                                       int minEpochs,
                                       int maxEpochs) {

        if (reporter == null)
            reporter = Reporters.silent();
        reporter.info("Start");
        if (maxOrder < 1) {
            String msg = "Max order must be >= 1."
                + " Found maxOrder=" + maxOrder;
            reporter.fatal(msg);
            throw new IllegalArgumentException(msg);
        }
        if (minImprovement < 0 || notFinite(minImprovement)) {
            String msg = "Min improvement must be finite and non-negative."
                + " Found minImprovement=" + minImprovement;
            reporter.fatal(msg);
            throw new IllegalArgumentException(msg);
        }
        if (minEpochs <= 0 || maxEpochs < minEpochs) {
            String msg = "Min epochs must be non-negative and less than or equal to max epochs."
                + " found minEpochs=" + minEpochs
                + " maxEpochs=" + maxEpochs;
            reporter.fatal(msg);
            throw new IllegalArgumentException(msg);
        }
        if (notFinite(featureInit) || featureInit == 0.0) {
            String msg = "Feature inits must be finite and non-zero."
                + " Found featureInit=" + featureInit;
            reporter.fatal(msg);
            throw new IllegalArgumentException(msg);
        }
        if (notFinite(initialLearningRate) || initialLearningRate < 0) {
            String msg = "Initial learning rate must be finite and non-negative."
                + " Found initialLearningRate=" + initialLearningRate;
            reporter.fatal(msg);
            throw new IllegalArgumentException(msg);
        }
        if (notFinite(regularization) || regularization < 0) {
            String msg = "Regularization must be finite and non-negative."
                + " Found regularization=" + regularization;
            reporter.fatal(msg);
            throw new IllegalArgumentException(msg);
        }
        for (int row = 0; row < columnIds.length; ++row) {
            if (columnIds == null) {
                String msg = "ColumnIds must not be null.";
                reporter.fatal(msg);
                throw new IllegalArgumentException(msg);
            }
            if (values == null) {
                String msg = "Values must not be null";
                reporter.fatal(msg);
                throw new IllegalArgumentException(msg);
            }
            if (columnIds[row] == null) {
                String msg = "All column Ids must be non-null."
                    + " Found null in row=" + row;
                reporter.fatal(msg);
                throw new IllegalArgumentException(msg);
            }
            if (values[row] == null) {
                String msg = "All values must be non-null."
                    + " Found null row=" + row;
                reporter.fatal(msg);
                throw new IllegalArgumentException(msg);
            }
            if (columnIds[row].length != values[row].length) {
                String msg = "column Ids and values must be same length."
                    + " For row=" + row
                    + " Found columnIds[row].length=" + columnIds[row].length
                    + " Found values[row].length=" + values[row].length;
                reporter.fatal(msg);
                throw new IllegalArgumentException(msg);
            }
            for (int i = 0; i < columnIds[row].length; ++i) {
                if (columnIds[row][i] < 0) {
                    String msg = "Column ids must be non-negative."
                        + " Found columnIds[" + row + "][" + i + "]=" + columnIds[row][i];
                    reporter.fatal(msg);
                    throw new IllegalArgumentException(msg);
                }
                if (i > 0 && columnIds[row][i-1] >= columnIds[row][i]) {
                    String msg = "All column Ids must be same length."
                        + " At row=" + row
                        + " Mismatch at rows " + i + " and " + (i-1);
                    reporter.fatal(msg);
                    throw new IllegalArgumentException(msg);
                }
            }
        }
        if (annealingRate < 0 || notFinite(annealingRate)) {
            String msg = "Annealing rate must be finite and non-negative."
                + " Found rate=" + annealingRate;
            reporter.fatal(msg);
            throw new IllegalArgumentException("14");
        }

        int numRows = columnIds.length;

        int numEntries = 0;
        for (double[] xs : values)
            numEntries += xs.length;

        int maxColumnIndex = 0;
        for (int[] xs : columnIds)
            for (int i = 0; i < xs.length; ++i)
                if (xs[i] > maxColumnIndex)
                    maxColumnIndex = xs[i];
        int numColumns = maxColumnIndex + 1;

        maxOrder = Math.min(maxOrder,Math.min(numRows,numColumns));

        double[][] cache = new double[values.length][];
        for (int row = 0; row < numRows; ++row) {
            cache[row] = new double[values[row].length];
            Arrays.fill(cache[row],0.0F);
        }

        List rowVectorList = new ArrayList(maxOrder);
        List columnVectorList = new ArrayList(maxOrder);
        for (int order = 0; order < maxOrder; ++order) {
            reporter.info("  Factor=" + order);
            double[] rowVector = initArray(numRows,featureInit,random);
            double[] columnVector = initArray(numColumns,featureInit,random);
            double rmseLast = Double.POSITIVE_INFINITY;
            for (int epoch = 0; epoch < maxEpochs; ++epoch) {
                double learningRateForEpoch = initialLearningRate / (1.0 + epoch / annealingRate);
                double sumOfSquareErrors = 0.0;
                for (int row = 0; row < numRows; ++row) {
                    int[] columnIdsForRow = columnIds[row];
                    double[] valuesForRow = values[row];
                    double[] cacheForRow = cache[row];
                    for (int i = 0; i < columnIdsForRow.length; ++i) {
                        int column = columnIdsForRow[i];
                        double prediction = predict(row,column,
                                                    rowVector,columnVector,
                                                    cacheForRow[i]);
                        double error = valuesForRow[i] - prediction;

                        sumOfSquareErrors += error * error;

                        double rowCurrent = rowVector[row];
                        double columnCurrent = columnVector[column];

                        rowVector[row]
                            += learningRateForEpoch
                            * (error * columnCurrent - regularization * rowCurrent);
                        columnVector[column]
                            += learningRateForEpoch
                            * (error * rowCurrent - regularization * columnCurrent);
                    }
                }
                double rmse = Math.sqrt(sumOfSquareErrors/numEntries);
                reporter.info("    epoch=" + epoch + " rmse=" + rmse);
                if ((epoch >= minEpochs) && (relativeDifference(rmse,rmseLast) < minImprovement)) {
                    reporter.info("Converged in epoch=" + epoch
                                  + " rmse=" + rmse
                                  + " relDiff=" + relativeDifference(rmse,rmseLast));
                    break;
                }
                rmseLast = rmse;
            }
            reporter.info("Order=" + order + " RMSE=" + rmseLast);
            rowVectorList.add(rowVector);
            columnVectorList.add(columnVector);

            for (int row = 0; row < cache.length; ++row) {
                double[] cacheRow = cache[row];
                for (int i = 0; i < cacheRow.length; ++i) {
                    cacheRow[i]
                        = predict(row,columnIds[row][i],
                                  rowVector,columnVector,
                                  cacheRow[i]);
                }
            }
        }
        double[][] rowVectors = rowVectorList.toArray(EMPTY_DOUBLE_2D_ARRAY);
        double[][] columnVectors = columnVectorList.toArray(EMPTY_DOUBLE_2D_ARRAY);

        return new SvdMatrix(transpose(rowVectors),
                             transpose(columnVectors),
                             maxOrder);
    }



    static double relativeDifference(double x, double y) {
        return Math.abs(x - y) / (Math.abs(x) + Math.abs(y));
    }

    static double[][] transpose(double[][] xs) {
        double[][] ys = new double[xs[0].length][xs.length];
        for (int i = 0; i < xs.length; ++i)
            for (int j = 0; j < xs[i].length; ++j)
                ys[j][i] = xs[i][j];
        return ys;
    }

    static double predict(int row, int column,
                          double[] rowVector, double[] columnVector,
                          double cache) {
        return cache + rowVector[row] * columnVector[column];
    }

    static double[] initArray(int size, double val, Random random) {
        double[] xs = new double[size];

        // random init
        for (int i = 0; i < xs.length; ++i)
            xs[i] = random.nextGaussian() * val;

        return xs;
    }

    static boolean notFinite(double x) {
        return Double.isNaN(x) || Double.isInfinite(x);
    }

    static double columnLength(double[][] xs, int col) {
        double sumOfSquares = 0.0;
        for (int i = 0; i < xs.length; ++i)
            sumOfSquares += xs[i][col] * xs[i][col];  // subopt array mem order
        return Math.sqrt(sumOfSquares);
    }


    static void scale(double[][] vecs, double[][] singularVecs,
                      double[] singularVals) {
        for (int i = 0; i < vecs.length; ++i)
            for (int k = 0; k < vecs[i].length; ++k)
                vecs[i][k] = singularVecs[i][k] * singularVals[k];
    }

    static void verifyDimensions(String prefix, int order, double[][] vectors) {
        for (int i = 0; i < vectors.length; ++i) {
            if (vectors[i].length != order) {
                String msg = "All vectors must have length equal to order."
                    + " order=" + order
                    + " " + prefix + "Vectors[" + i + "].length="
                    + vectors[i].length;
                throw new IllegalArgumentException(msg);
            }
        }
    }

    // normalize columns to unit length;
    static double[][] normalizeColumns(double[][] xs) {
        int numDims = xs.length;
        int order = xs[0].length;
        double[][] result = new double[numDims][order];
        for (int j = 0; j < order; ++j) {
            double sumOfSquares = 0.0;
            for (int i = 0; i < numDims; ++i) {
                double valIJ = xs[i][j];
                result[i][j] = valIJ;
                sumOfSquares += valIJ * valIJ;
            }
            double length = Math.sqrt(sumOfSquares);
            for (int i = 0; i < numDims; ++i)
                result[i][j] /= length;
        }
        return result;
    }

    static final double[][] EMPTY_DOUBLE_2D_ARRAY = new double[0][];


}