All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.aliasi.matrix.SvdMatrix Maven / Gradle / Ivy

Go to download

This is the original Lingpipe: http://alias-i.com/lingpipe/web/download.html There were not made any changes to the source code.

There is a newer version: 4.1.2-JL1.0
Show newest version
/*
 * LingPipe v. 4.1.0
 * Copyright (C) 2003-2011 Alias-i
 *
 * This program is licensed under the Alias-i Royalty Free License
 * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i
 * Royalty Free License Version 1 for more details.
 *
 * You should have received a copy of the Alias-i Royalty Free License
 * Version 1 along with this program; if not, visit
 * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
 * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
 * +1 (718) 290-9170.
 */

package com.aliasi.matrix;

import com.aliasi.io.LogLevel;
import com.aliasi.io.Reporter;
import com.aliasi.io.Reporters;

import java.io.PrintWriter;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import java.util.Random;

/**
 * An SvdMatrix provides a means of storing a matrix that
 * has been factored via a singular-value decomposition (SVD).  This
 * class also provides a static method for computing regularized
 * singular-value decompositions of partial matrices.
 *
 * 

Singular Value Decomposition

* *

Singular value decomposition (SVD) factors an * m×n matrix A into a product of * three matrices: * *

 * A = U * S * VT
* * where U is an m×k matrix, * V is an n×k matrix, and * S is a k×k matrix, where * k is the rank of the matrix A. The * multiplication (*) is matrix multiplication and * the superscripted T indicates matrix transposition. * *

The m-dimensional vectors making up the columns of * U are called left singular vectors, whereas * the n-dimesnional vectors making up the rows of * V are called right singular vectors. The values on * the diagonal of S are called the singular values. * The combination of the q-th left singular vector, * right singular vector, and singular value is known as a factor. * *

The singular value matrix S is a diagonal matrix * with positive, strictly non-increasing values, so that * S[i][i] >= S[i+1][i+1], for i < k. * The set of left and set of right singular vectors are orthonormal. * Normality means that each singular vector is of unit length (length * 1). Orthogonality means that any pair of left singular * vectors is orthogonal and any pair of right singular vectors are * orthogonal (meaning their dot product is 0). * *

Matrices have unique singular-value decompositions * up to the re-ordering of columns with equal singular values and * up to cancelling sign changes in the singular vectors. * *

Construction and Value Computation

* * An SvdMatrix may be constructed out of the singular * vectors and singular values, or out of the vectors with singular * values scaled in. * *

Given that S is diagonal, the value of a particular * entry in A works out to: * *

 * A[i][j] = Σk U[i][k] * S[k][k] * V[j][k]
* * To save time in the application and space in the class, * we factor S into U and V * to produce a new pair of matrices U' and V' * defined by: * *
 * U' = U * sqrt(S)
 * V'T = sqrt(S) * VT
* * with the square-root performed component-wise: * *
 * sqrt(S)[i][j] = sqrt(S[i][j])
* * By the associativity of matrix multiplication, we have: * *
 * U * S * VT
 * = U * sqrt(S) * sqrt(S) * V
 * = U' * V'T
* *

Thus the class implementation is able to store U' * and V', thus reducing the amount computation involved * in returning a value (using column vectors as the default vector * orientation): * *

 * A[i][j] = U'[i]T * V'[j]
* * *

Square Error and the Frobenius Norm

* * Suppose A and B are * m×n matrices. The square error between them is * defined by the so-called Frobenius norm, which extends the standard * vector L2 or Euclidean norm to matrices: * *
 * squareError(A,B)

 * = frobeniusNorm(A-B)

 * = Σi < m Σj < n (A[i][j] - B[i][j])2
* * The square error is sometimes referred to as the residual sum * of squares (RSS), because A[i][j] - B[i][j] * is the residual (difference between actual and predicted value). * *

Lower-order Approximations

* * Consider factoring a matrix A of dimension * m×n into the product of two matrices * X * YT, where X is of dimension * m×k and Y is of dimension * n×k. We may then measure the square * error squareError(A,X * Y) to determine how well * the factorization matches A. We know that if * we take U', V' from the singular value * decomposition that: * *
 * squareError(A, U' * V'T) = 0.0
* * Here U' and V' have a number of columns * (called factors) equal to the rank of the matrix A. * The singular value decomposition is such that the first * k columns of U' and V' * provide the best order q approximation of * A of any X and Y * of dimensions m×q and n×q * In symbols: * *
 * U'q, V'q = argmin X is m×q, Y is n×q squareError(A, X * YT)
* * where U'q is the restriction of U' * to its first q columns. * *

Often errors are reported as means, where the mean square error * (MSE) is defined by: * *

 * meanSquareError(A,B) = squareError(A,B)/(m×n)
* *

To adjust to a linear scale, the square root of mean square * error (RMSE) is often used: * *

 * rootMeanSquareError(A,B) = sqrt(meanSquareError(A,B))
* * *

Partial Matrices

* * A partial matrix is one in which some of the values are unknown. * This is in contrast with a sparse matrix, in which most of the * values are zero. A variant of singular value decomposition may be * used to impute the unknown values by minimizing square error with * respect to the known values only. Unknown values are then simply * derived from the factorization U' * V'T. * Typically, the approximation is of lower order than the rank of * the matrix. * *

Regularized SVD via Shrinkage

* * Linear regression techniques such as SVD often overfit their * training data in order to derive exact answers. This problem * is mitigated somewhat by choosing low-order approximations to * the full-rank SVD. Another option is to penalize large values * in the singular vectors, thus favoring smaller values. The * most common way to do this because of its practicality is via * parameter shrinkage. * *

Shrinkage is a general technique in least squares fitting that * adds a penalty term proportional to the square of the size of * the parameters. Thus the square error objective function is * replaced with a regularized version: * *

 * regularizedError(A, U' * V')

 * = squareError(A, U' * V')

 * + parameterCost(U') + parameterCost(V')
* * where the parameter costs for a vector X of * dimensionality q is the sum of the squared * parameters: * *
 * parameterCost(X)

 * = λ * Σi < q X[i]2

 * = λ * length(X)2
* * Note that the hyperparameter λ scales the * parameter cost term in the overall error. * *

Setting the regularization parameter to zero sets the parameter * costs to zero, resulting in an unregularized SVD computation. *

In Bayesian terms, regularization is equivalent to a normal * prior on parameter values centered on zero with variance controlled * by λ. The resulting minimizer of regularized * error is the maximum a posteriori (MAP) solution. The Bayesian * approach also allows covariances to be estimated (including simple * parameter variance estimates), but these are not implemented in * this class. * * *

Regularized Stochastic Gradient Descent

* * Singular value decomposition may be computed "exactly" * (modulo arithmetic precision and convergence) using an algorithm * whose time complexity is O(m3 + * n3) for an m×n matrix (equal * to O(max(m,n)3). Arithmetic precision is * especially vexing at singular values near zero and with highly * correlated rows or columns in the input matrix. * *

For large partial matrices, we use a form of stochastic gradient * descent which computes a single row and column singular vector (a * single factor, that is) at a time. Each factor is estimated by * iteratively visiting the data points and adjusting the unnormalized * singular vectors making up the current factor. Each adjustment is * a least-squares fitting step, where we simultaneously adjust the * left singular vectors given the right singular vectors and * vice-versa. * *

The least-squares adjustments take the following form. For each * value, we compute the current error (using the order k * approximation and the current order k+1 values) and * then move the vectors to reduce error. We use U' and * V' for the incremental values of the singular vectors * scaled by the singular values: * *

 * for (k = 0; k < maxOrder; ++k)
 *     for (i < m) U'[i][k] = random.nextGaussian()*initValue;
 *     for (j < n) V'[j][k] = random.nextGaussian()*initValue;
 *     for (epoch = 0; epoch < maxEpochs && not converged; ++epoch)
 *         for (i,j such that M[i][j] defined)
 *             error = M[i][j] - U'k[i] * V'k[j] // * is vector dot product
 *             uTemp = U'[i][k]
 *             vTemp = V'[j][k]
 *             U'[i][k] += learningRate[epoch] * ( error * vTemp - regularization * uTemp )
 *             V'[j][k] += learningRate[epoch] * ( error * uTemp - regularization * vTemp )
* * where initValue, maxEpochs, * learningRate (see below), and * regularization are algorithm hyperparameters. Note * that the initial values of the singular vectors are set randomly to * the result of a draw from a Gaussian (normal) distribution with * mean 0.0 and variance 1.0. * *

Because we use the entire set of * factors in the error computation, the current factor is guaranteed * to have singular vectors orthogonal to the singular vectors already * computed. * *

Note that in the actual implementation, the contribution to the * error of the first k-1 factors is cached to reduce * time in the inner loop. This requires a double-length floating * point value for each defined entry in the matrix. * *

Gradient Interpretation

* *

Like most linear learners, this algorithm merely moves the * parameter vectors U'[i] and U'[j] in the * direction of the gradient of the error. The gradient is the * multivariate derivative of the objective function being minimized. * Because our object is squared error, the gradient is just its * derivative, which is just (two times) the (linear) error itself. We * roll the factor of 2 into the learning rate to derive the update in * the algorithm pseudo-code above. * *

The term (error * vTemp) is the component of the * error gradient due to the current value of V'[i][k] * and the term (regularization * uTemp) is the component of the * gradient to the size of the parameter U'[i][k]. The * updates thus move the parameter vectors in the direction of * the gradient. * *

Convergence Conditions

* *

The convergence conditions for a given factor require either * hitting the maximum number of allowable epochs, or finding the * improvement from one epoch to the next is below some relative * threshold: * *

 * regError(epoch) = regError(M,U'k(epoch) * V'k(epoch)T)

 * relativeImprovement(epoch+1) = relativeDiff(regError(epoch+1), regError(epoch))

 * relativeDiff(x,y) = abs(x-y)/(abs(x) + abs(y))
* * When the relative difference in square error is less than * a hyperparameter threshold minImprovement, the * epoch terminates and the algorithm moves on to the next * factor k+1. * *

Note that a complete matrix is a degenerate kind of partial * matrix. The gradient descent computation still works in this * situation, but is not as efficient or as accurate as an * algebraic SVD solver for small matrices. * *

Annealing Schedule

* * Learning rates that are too high are unstable, whereas learning rates * that are too low never reach their targets. To get around this * problem, the learning rate, learningRate[epoch], is * lowered as the number of epochs increase. This lowering of the * learning rate has a thermodynamic interpretation in terms of free * energy, hence the name "annealing". Larger moves are * made in earlier epochs, then the temperature is gradually lowered * so that the learner may settle into a stable fixed point. The * function learningRate[epoch] is called the annealing * schedule. * *

There are theoretical requirements on the annealing schedule * that guarantee convergence (up to arithmetic precision and no upper * bound on the number of epochs): * *

 * Σepoch learningRate[epoch] = infinity

 * Σepoch learningRate[epoch]2 < infinity
* * The schedule we use is the one commonly chosen to meet the * above requirements: * *
 * learningRate[epoch] = initialLearningRate / (1 + epoch/annealingRate)
* * where initialLearningRate is an initial learning rate and * annealingFactor determines how quickly it shrinks. * The learning rate moves from its initial size * (initialLearningRate) to one half (1/2) of its * original size after annealingRate epochs, and * moves from its initial size to one tenth (1/10) of * its initial size after 9 * annealingRate epochs, * and one hundredth of its initial size after * 99 * annealingRate epochs.. * *

Parameter Choices

* *

The previous discussion has introduced a daunting list of * parameters required for gradient descent for singular value * decomposition. Unfortunately, the stochastic gradient descent * solver requires a fair amount of tuning to recover a low mean * square error factorization. The optimal settings will also depend * on the input matrix; for example, very sparse partial matrices are * much easier to fit than dense matrices. * *

Maximum Order

* *

Determining the order of the decomposition is mostly a matter * of determining how many orders are needed for the amount of * smoothing required. Low order reconstructions are useful for * most applications. One way to determine maximum order is using * held out data for an application. Another is to look for * a point where the singular values become (relatively) insignificant. * * *

Maximum Epochs and Early Stopping

* *

For low mean square error factorizations, many epochs may be * necessary. Lowering the maximum number of epochs leads to what is * known as early stopping. Sometimes, early stopping leads to * more accurate held-out predictions, but it will always raise * the factorization error (training data predictions). In general, * the maximum number of epochs needs to be set empirically. * Initially, try fewer epochs and gradually raise the number of * epochs until the desired accuracy is achieved. * *

Minimum Improvement

* * By setting the minimum improvement to 0.0, the algorithm is forced * to use the maximum number of epochs. By setting it above this * level, a form of early stopping is achieved when the benefit of * another epoch of refitting falls below a given improvement * threshold. This value may also be set on an application basis * using held out data, or it may be fit to achieve a given level of * mean square error on the training (input) data. The minimum * improvement needs to be set fairly low (less than 0.000001) to * achieve reasonably precise factorizations. Note that minimum * improvement is defined relatively, as noted above. * *

Initial Parameter Values

* *

Initial values of the singular vectors are not particularly * sensitive, because we are using multiplicative updates. A good * rule of thumb for starting values is the the inverse square root of * the maximum order: * *

 * initVal ~ 1 / sqrt(maxOrder)
* *

Learning Rate, Maximum Epochs and Annealing

* *

A good starting point for the learning rate is 0.01. The * annealing parameter should be set so that the learning rate is cut * by at least a factor of 10 in the final rounds. This calls for a * value that's roughly one tenth of the maximum number of epochs. If * the initial learning rate is set higher, then the annealing * schedule should be more agressive so that it spends a fair amount * of time in the 0.001 to 0.0001 learning rate range. * * *

References

* *

There are thorough Wikipedia articles on singular value decomposition * and gradient descent, although the SVD entry focuses entirely on * complete (non-partial) matrices: * *

* * Both of the standard machine learning textbooks have good * theoretical explanations of least squares, regularization, * gradient descent, and singular value decomposition, but not * all together: * *
    *
  • Chris Bishop. 2007. Pattern Recognition and Machine Learning. Springer.
  • *
  • Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. * The Elements of Statistical Learning. Springer. *
* *

The following is a particularly clear explanation of many of * the issues involved in the context of neural networks: * *

    *
  • Genevieve Orr, Nici Schraudolph, and Fred Cummins. 1999. CS-449: Neural Networks. Willamette University course notes.
  • *
*

Our partial SVD solver is based on C code from Timely * Development (see license below). Timely based their code on Simon * Funk's blog entry. Simon Funk's approach was itself based on his * and Genevieve Gorrell's 2005 EACL paper. * *

* *

Acknowledgements

* *

The singular value decomposition code is rather loosely based on * a C program developed by Timely Development. * Here is the copyright notice for the original code: * *

 * // SVD Sample Code
 * //
 * // Copyright (C) 2007 Timely Development (www.timelydevelopment.com)
 * //
 * // Special thanks to Simon Funk and others from the Netflix Prize contest
 * // for providing pseudo-code and tuning hints.
 * //
 * // Feel free to use this code as you wish as long as you include
 * // these notices and attribution.
 * //
 * // Also, if you have alternative types of algorithms for accomplishing
 * // the same goal and would like to contribute, please share them as well :)
 * //
 * // STANDARD DISCLAIMER:
 * //
 * // - THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY
 * // - OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
 * // - LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR
 * // - FITNESS FOR A PARTICULAR PURPOSE.
 * 
* * @author Bob Carpenter * @version 4.0.0 * @since LingPipe3.2 */ public class SvdMatrix extends AbstractMatrix { private final double[][] mRowVectors; private final double[][] mColumnVectors; private final int mOrder; /** * Construct a factored matrix using the specified row and column * vectors at the specified order. Each vector in the arrays of * row and column vectors must be of the same dimension as the * order. * *

See the class documentation above for more information * on singular value decomposition. * * @param rowVectors Row vectors, indexed by row. * @param columnVectors Column vectors, indexed by column. * @param order Dimensionality of the row and column vectors. */ public SvdMatrix(double[][] rowVectors, double[][] columnVectors, int order) { verifyDimensions("row",order,rowVectors); verifyDimensions("column",order,columnVectors); mRowVectors = rowVectors; mColumnVectors = columnVectors; mOrder = order; } /** * Construct an SVD matrix using the specified singular vectors * and singular values. The order of the factorization is equal to * the length of the singular values. Each singular vector must * be the same dimensionality as the array of singular values. * *

See the class documentation above for more information * on singular value decomposition. * * @param rowSingularVectors Row singular vectors, indexed by row. * @param columnSingularVectors Column singular vectors, indexed by column. * @param singularValues Array of singular values, in decreasing order. */ public SvdMatrix(double[][] rowSingularVectors, double[][] columnSingularVectors, double[] singularValues) { mOrder = singularValues.length; verifyDimensions("row",mOrder,rowSingularVectors); verifyDimensions("column",mOrder,columnSingularVectors); mRowVectors = new double[rowSingularVectors.length][mOrder]; mColumnVectors = new double[columnSingularVectors.length][mOrder]; double[] sqrtSingularValues = new double[singularValues.length]; for (int i = 0; i < sqrtSingularValues.length; ++i) sqrtSingularValues[i] = Math.sqrt(singularValues[i]); scale(mRowVectors,rowSingularVectors,sqrtSingularValues); scale(mColumnVectors,columnSingularVectors,sqrtSingularValues); } /** * Returns the number of rows in this matrix. * * @return The number of rows in this matrix. */ @Override public int numRows() { return mRowVectors.length; } /** * Returns the number of columns in this matrix. * * @return The number of columns in this matrix. */ @Override public int numColumns() { return mColumnVectors.length; } /** * Returns the order of this factorization. The order * is the number of dimensions in the singular vectors * and the singular values. * * @return The order of this decomposition. */ public int order() { return mRowVectors[0].length; } /** * Returns the value of this matrix at the specified row and column. * * @param row Row index. * @param column Column index. * @return The value of this matrix at the specified row and * column. */ @Override public double value(int row, int column) { double[] rowVec = mRowVectors[row]; double[] colVec = mColumnVectors[column]; double result = 0.0; for (int i = 0; i < rowVec.length; ++i) result += rowVec[i] * colVec[i]; return result; } /** * Returns the array of singular values for this decomposition. * * @return The singular values for this decomposition. */ public double[] singularValues() { double[] singularValues = new double[mOrder]; for (int i = 0; i < singularValues.length; ++i) singularValues[i] = singularValue(i); return singularValues; } /** * Returns the singular value for the specified order. * * @param order Dimension of the singular value. * @return The singular value at the specified order. */ public double singularValue(int order) { if (order >= mOrder) { String msg = "Maximum order=" + (mOrder-1) + " found order=" + order; throw new IllegalArgumentException(msg); } return columnLength(mRowVectors,order) * columnLength(mColumnVectors,order); } /** * Returns a matrix in which the left singular vectors make up the * columns. The first index is for the row of the original matrix * and the second index is for the order of the singular vector. * Thus the returned matrix is m×k, if the * original input was an m×n matrix and SVD was * performed at order k. * @return The left singular vectors of this matrix. */ public double[][] leftSingularVectors() { return normalizeColumns(mRowVectors); } /** * Returns a matrix in which the right singular vectors make up * the columns. The first index is for the column of the original * matrix and the second index is for the order of the singular * vector. Thus the returned matrix is n×k, if * the original input was an m×n matrix and SVD * was performed at order k. * * @return The right singular vectors. */ public double[][] rightSingularVectors() { return normalizeColumns(mColumnVectors); } /** * Returns the signular value decomposition of the specified * complete matrix of values. * *

For a full description of the arguments and values, see * the method documentation for * {@link #partialSvd(int[][],double[][],int,double,double,double,double,Reporter,double,int,int)} * and the class documentation above. * *

The two-dimensional array of values must be an * m × n matrix. In particular, each row * must be of the same length. If this is not the case, an illegal * argument exception will be raised. * *

This is now a utility method that calls {@link * #svd(double[][],int,double,double,double,double,Reporter,double,int,int)} * with a reporter wrapping the specified print writer at the * debug level, or a silent print writer. * * @param values Array of values. * @param maxOrder Maximum order of the decomposition. * @param featureInit Initial value for singular vectors. * @param initialLearningRate Incremental multiplier of error * determining how fast learning occurs. * @param annealingRate Rate at which annealing occurs; higher values * provide more gradual annealing. * @param regularization A regularization constant to damp learning. * @param reporter Reporter to which to send progress and error * reports. * @param minImprovement Minimum relative improvement in mean * square error required to finish an epoch. * @param minEpochs Minimum number of epochs for training. * @param maxEpochs Maximum number of epochs for training. * training, or {@code null} if no output is desired. * @return Singular value decomposition for the specified partial matrix * at the specified order. * @throws IllegalArgumentException Under conditions listed in the * method documentation above. */ public static SvdMatrix svd(double[][] values, int maxOrder, double featureInit, double initialLearningRate, double annealingRate, double regularization, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) { if (reporter == null) reporter = Reporters.silent(); int m = values.length; int n = values[0].length; reporter.info("Calculating SVD"); reporter.info("#Rows=" + m + " #Cols=" + n); // check all rows same length for (int i = 1; i < m; ++i) { if (values[i].length != n) { String msg = "All rows must be of same length." + " Found row[0].length=" + n + " row[" + i + "]=" + values[i].length; reporter.fatal(msg); throw new IllegalArgumentException(msg); } } // shared column Ids rows int[] sharedRow = new int[n]; for (int j = 0; j < n; ++j) sharedRow[j] = j; int[][] columnIds = new int[m][]; for (int j = 0; j < m; ++j) columnIds[j] = sharedRow; return partialSvd(columnIds, values, maxOrder, featureInit, initialLearningRate, annealingRate, regularization, reporter, minImprovement, minEpochs, maxEpochs); } /** * Return the singular value decomposition of the specified * partial matrix, using the specified search parameters. * *

The writer parameter may be set to allow incremental progress * reports to that writer during training. These report on RMSE * per epoch. * *

See the class documentation above for a description of the * algorithm. * *

There are a number of constraints on the input, any * violation of which will raise an illegal argument exception. * The conditions are: *

    * * *
  • The maximum order must be greater than zero.
  • * *
  • The minimum relative improvement in mean square error must be non-negative * and finite.
  • * *
  • The minimum number of epochs must be greater than zero * and less than or equal to the maximum number of epochs.
  • * *
  • The feature initialization value must be non-zero and finite.
  • * *
  • The learning rate must be positive and finite.
  • * *
  • The regularization parameter must be non-negative and finite.
  • * *
  • The column identitifer and value arrays must be the same * length.
  • * *
  • The elements of the column identifier array and the * value array must all be of the same length.
  • * *
  • All column identifiers must be non-negative.
  • * *
  • Each row of the column identifier matrix must contain * columns in strictly ascending order. * *
* * @param columnIds Identifiers of column index for given row and entry. * @param values Values at row and column index for given entry. * @param maxOrder Maximum order of the decomposition. * @param featureInit Initial value for singular vectors. * @param initialLearningRate Incremental multiplier of error determining how * fast learning occurs. * @param annealingRate Rate at which annealing occurs; higher values * provide more gradual annealing. * @param regularization A regularization constant to damp learning. * @param reporter Reporter to which progress reports are written, or * {@code null} if no reporting is required. * @param minImprovement Minimum relative improvement in mean square error required * to finish an epoch. * @param minEpochs Minimum number of epochs for training. * @param maxEpochs Maximum number of epochs for training. * @return Singular value decomposition for the specified partial matrix * at the specified order. * @throws IllegalArgumentException Under conditions listed in the * method documentation above. */ public static SvdMatrix partialSvd(int[][] columnIds, double[][] values, int maxOrder, double featureInit, double initialLearningRate, double annealingRate, double regularization, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) { return partialSvd(columnIds,values, maxOrder, featureInit,initialLearningRate,annealingRate,regularization, new Random(), reporter, minImprovement,minEpochs,maxEpochs); } /** * This method is identical to the other singular-value decomposition * method, only with a specified randomizer. * * @param columnIds Identifiers of column index for given row and entry. * @param values Values at row and column index for given entry. * @param maxOrder Maximum order of the decomposition. * @param featureInit Initial value for singular vectors. * @param initialLearningRate Incremental multiplier of error determining how * fast learning occurs. * @param annealingRate Rate at which annealing occurs; higher values * provide more gradual annealing. * @param random Randomizer to use for initialization. * @param reporter Reporter to which progress reports are written, or {@code null} * for no reporting. * @param regularization A regularization constant to damp learning. * @param minImprovement Minimum relative improvement in mean square error required * to finish an epoch. * @param minEpochs Minimum number of epochs for training. * @param maxEpochs Maximum number of epochs for training. * @return Singular value decomposition for the specified partial matrix * at the specified order. * @throws IllegalArgumentException Under conditions listed in the * method documentation above. */ static SvdMatrix partialSvd(int[][] columnIds, double[][] values, int maxOrder, double featureInit, double initialLearningRate, double annealingRate, double regularization, Random random, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs) { if (reporter == null) reporter = Reporters.silent(); reporter.info("Start"); if (maxOrder < 1) { String msg = "Max order must be >= 1." + " Found maxOrder=" + maxOrder; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (minImprovement < 0 || notFinite(minImprovement)) { String msg = "Min improvement must be finite and non-negative." + " Found minImprovement=" + minImprovement; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (minEpochs <= 0 || maxEpochs < minEpochs) { String msg = "Min epochs must be non-negative and less than or equal to max epochs." + " found minEpochs=" + minEpochs + " maxEpochs=" + maxEpochs; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (notFinite(featureInit) || featureInit == 0.0) { String msg = "Feature inits must be finite and non-zero." + " Found featureInit=" + featureInit; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (notFinite(initialLearningRate) || initialLearningRate < 0) { String msg = "Initial learning rate must be finite and non-negative." + " Found initialLearningRate=" + initialLearningRate; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (notFinite(regularization) || regularization < 0) { String msg = "Regularization must be finite and non-negative." + " Found regularization=" + regularization; reporter.fatal(msg); throw new IllegalArgumentException(msg); } for (int row = 0; row < columnIds.length; ++row) { if (columnIds == null) { String msg = "ColumnIds must not be null."; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (values == null) { String msg = "Values must not be null"; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (columnIds[row] == null) { String msg = "All column Ids must be non-null." + " Found null in row=" + row; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (values[row] == null) { String msg = "All values must be non-null." + " Found null row=" + row; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (columnIds[row].length != values[row].length) { String msg = "column Ids and values must be same length." + " For row=" + row + " Found columnIds[row].length=" + columnIds[row].length + " Found values[row].length=" + values[row].length; reporter.fatal(msg); throw new IllegalArgumentException(msg); } for (int i = 0; i < columnIds[row].length; ++i) { if (columnIds[row][i] < 0) { String msg = "Column ids must be non-negative." + " Found columnIds[" + row + "][" + i + "]=" + columnIds[row][i]; reporter.fatal(msg); throw new IllegalArgumentException(msg); } if (i > 0 && columnIds[row][i-1] >= columnIds[row][i]) { String msg = "All column Ids must be same length." + " At row=" + row + " Mismatch at rows " + i + " and " + (i-1); reporter.fatal(msg); throw new IllegalArgumentException(msg); } } } if (annealingRate < 0 || notFinite(annealingRate)) { String msg = "Annealing rate must be finite and non-negative." + " Found rate=" + annealingRate; reporter.fatal(msg); throw new IllegalArgumentException("14"); } int numRows = columnIds.length; int numEntries = 0; for (double[] xs : values) numEntries += xs.length; int maxColumnIndex = 0; for (int[] xs : columnIds) for (int i = 0; i < xs.length; ++i) if (xs[i] > maxColumnIndex) maxColumnIndex = xs[i]; int numColumns = maxColumnIndex + 1; maxOrder = Math.min(maxOrder,Math.min(numRows,numColumns)); double[][] cache = new double[values.length][]; for (int row = 0; row < numRows; ++row) { cache[row] = new double[values[row].length]; Arrays.fill(cache[row],0.0F); } List rowVectorList = new ArrayList(maxOrder); List columnVectorList = new ArrayList(maxOrder); for (int order = 0; order < maxOrder; ++order) { reporter.info(" Factor=" + order); double[] rowVector = initArray(numRows,featureInit,random); double[] columnVector = initArray(numColumns,featureInit,random); double rmseLast = Double.POSITIVE_INFINITY; for (int epoch = 0; epoch < maxEpochs; ++epoch) { double learningRateForEpoch = initialLearningRate / (1.0 + epoch / annealingRate); double sumOfSquareErrors = 0.0; for (int row = 0; row < numRows; ++row) { int[] columnIdsForRow = columnIds[row]; double[] valuesForRow = values[row]; double[] cacheForRow = cache[row]; for (int i = 0; i < columnIdsForRow.length; ++i) { int column = columnIdsForRow[i]; double prediction = predict(row,column, rowVector,columnVector, cacheForRow[i]); double error = valuesForRow[i] - prediction; sumOfSquareErrors += error * error; double rowCurrent = rowVector[row]; double columnCurrent = columnVector[column]; rowVector[row] += learningRateForEpoch * (error * columnCurrent - regularization * rowCurrent); columnVector[column] += learningRateForEpoch * (error * rowCurrent - regularization * columnCurrent); } } double rmse = Math.sqrt(sumOfSquareErrors/numEntries); reporter.info(" epoch=" + epoch + " rmse=" + rmse); if ((epoch >= minEpochs) && (relativeDifference(rmse,rmseLast) < minImprovement)) { reporter.info("Converged in epoch=" + epoch + " rmse=" + rmse + " relDiff=" + relativeDifference(rmse,rmseLast)); break; } rmseLast = rmse; } reporter.info("Order=" + order + " RMSE=" + rmseLast); rowVectorList.add(rowVector); columnVectorList.add(columnVector); for (int row = 0; row < cache.length; ++row) { double[] cacheRow = cache[row]; for (int i = 0; i < cacheRow.length; ++i) { cacheRow[i] = predict(row,columnIds[row][i], rowVector,columnVector, cacheRow[i]); } } } double[][] rowVectors = rowVectorList.toArray(EMPTY_DOUBLE_2D_ARRAY); double[][] columnVectors = columnVectorList.toArray(EMPTY_DOUBLE_2D_ARRAY); return new SvdMatrix(transpose(rowVectors), transpose(columnVectors), maxOrder); } static double relativeDifference(double x, double y) { return Math.abs(x - y) / (Math.abs(x) + Math.abs(y)); } static double[][] transpose(double[][] xs) { double[][] ys = new double[xs[0].length][xs.length]; for (int i = 0; i < xs.length; ++i) for (int j = 0; j < xs[i].length; ++j) ys[j][i] = xs[i][j]; return ys; } static double predict(int row, int column, double[] rowVector, double[] columnVector, double cache) { return cache + rowVector[row] * columnVector[column]; } static double[] initArray(int size, double val, Random random) { double[] xs = new double[size]; // random init for (int i = 0; i < xs.length; ++i) xs[i] = random.nextGaussian() * val; return xs; } static boolean notFinite(double x) { return Double.isNaN(x) || Double.isInfinite(x); } static double columnLength(double[][] xs, int col) { double sumOfSquares = 0.0; for (int i = 0; i < xs.length; ++i) sumOfSquares += xs[i][col] * xs[i][col]; // subopt array mem order return Math.sqrt(sumOfSquares); } static void scale(double[][] vecs, double[][] singularVecs, double[] singularVals) { for (int i = 0; i < vecs.length; ++i) for (int k = 0; k < vecs[i].length; ++k) vecs[i][k] = singularVecs[i][k] * singularVals[k]; } static void verifyDimensions(String prefix, int order, double[][] vectors) { for (int i = 0; i < vectors.length; ++i) { if (vectors[i].length != order) { String msg = "All vectors must have length equal to order." + " order=" + order + " " + prefix + "Vectors[" + i + "].length=" + vectors[i].length; throw new IllegalArgumentException(msg); } } } // normalize columns to unit length; static double[][] normalizeColumns(double[][] xs) { int numDims = xs.length; int order = xs[0].length; double[][] result = new double[numDims][order]; for (int j = 0; j < order; ++j) { double sumOfSquares = 0.0; for (int i = 0; i < numDims; ++i) { double valIJ = xs[i][j]; result[i][j] = valIJ; sumOfSquares += valIJ * valIJ; } double length = Math.sqrt(sumOfSquares); for (int i = 0; i < numDims; ++i) result[i][j] /= length; } return result; } static final double[][] EMPTY_DOUBLE_2D_ARRAY = new double[0][]; }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy