com.aliasi.matrix.SvdMatrix Maven / Gradle / Ivy
Show all versions of aliasi-lingpipe Show documentation
/*
* LingPipe v. 4.1.0
* Copyright (C) 2003-2011 Alias-i
*
* This program is licensed under the Alias-i Royalty Free License
* Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Alias-i
* Royalty Free License Version 1 for more details.
*
* You should have received a copy of the Alias-i Royalty Free License
* Version 1 along with this program; if not, visit
* http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
* Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
* +1 (718) 290-9170.
*/
package com.aliasi.matrix;
import com.aliasi.io.LogLevel;
import com.aliasi.io.Reporter;
import com.aliasi.io.Reporters;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Random;
/**
* An SvdMatrix
provides a means of storing a matrix that
* has been factored via a singular-value decomposition (SVD). This
* class also provides a static method for computing regularized
* singular-value decompositions of partial matrices.
*
* Singular Value Decomposition
*
* Singular value decomposition (SVD) factors an
* m×n
matrix A
into a product of
* three matrices:
*
*
* A = U * S * VT
*
* where U
is an m×k
matrix,
* V
is an n×k
matrix, and
* S
is a k×k
matrix, where
* k
is the rank of the matrix A
. The
* multiplication (*
) is matrix multiplication and
* the superscripted T
indicates matrix transposition.
*
* The m
-dimensional vectors making up the columns of
* U
are called left singular vectors, whereas
* the n
-dimesnional vectors making up the rows of
* V
are called right singular vectors. The values on
* the diagonal of S
are called the singular values.
* The combination of the q
-th left singular vector,
* right singular vector, and singular value is known as a factor.
*
*
The singular value matrix S
is a diagonal matrix
* with positive, strictly non-increasing values, so that
* S[i][i] >= S[i+1][i+1]
, for i < k
.
* The set of left and set of right singular vectors are orthonormal.
* Normality means that each singular vector is of unit length (length
* 1
). Orthogonality means that any pair of left singular
* vectors is orthogonal and any pair of right singular vectors are
* orthogonal (meaning their dot product is 0
).
*
*
Matrices have unique singular-value decompositions
* up to the re-ordering of columns with equal singular values and
* up to cancelling sign changes in the singular vectors.
*
*
Construction and Value Computation
*
* An SvdMatrix
may be constructed out of the singular
* vectors and singular values, or out of the vectors with singular
* values scaled in.
*
* Given that S
is diagonal, the value of a particular
* entry in A
works out to:
*
*
* A[i][j] = Σk
U[i][k] * S[k][k] * V[j][k]
*
* To save time in the application and space in the class,
* we factor S
into U
and V
* to produce a new pair of matrices U'
and V'
* defined by:
*
* * * with the square-root performed component-wise: * ** U' = U * sqrt(S) * V'T = sqrt(S) * VT
* * By the associativity of matrix multiplication, we have: * ** sqrt(S)[i][j] = sqrt(S[i][j])
* ** U * S * VT * = U * sqrt(S) * sqrt(S) * V * = U' * V'T
Thus the class implementation is able to store Often errors are reported as means, where the mean square error
* (MSE) is defined by:
*
* To adjust to a linear scale, the square root of mean square
* error (RMSE) is often used:
*
* Shrinkage is a general technique in least squares fitting that
* adds a penalty term proportional to the square of the size of
* the parameters. Thus the square error objective function is
* replaced with a regularized version:
*
* Setting the regularization parameter to zero sets the parameter
* costs to zero, resulting in an unregularized SVD computation.
* In Bayesian terms, regularization is equivalent to a normal
* prior on parameter values centered on zero with variance controlled
* by For large partial matrices, we use a form of stochastic gradient
* descent which computes a single row and column singular vector (a
* single factor, that is) at a time. Each factor is estimated by
* iteratively visiting the data points and adjusting the unnormalized
* singular vectors making up the current factor. Each adjustment is
* a least-squares fitting step, where we simultaneously adjust the
* left singular vectors given the right singular vectors and
* vice-versa.
*
* The least-squares adjustments take the following form. For each
* value, we compute the current error (using the order Because we use the entire set of
* factors in the error computation, the current factor is guaranteed
* to have singular vectors orthogonal to the singular vectors already
* computed.
*
* Note that in the actual implementation, the contribution to the
* error of the first Like most linear learners, this algorithm merely moves the
* parameter vectors The term The convergence conditions for a given factor require either
* hitting the maximum number of allowable epochs, or finding the
* improvement from one epoch to the next is below some relative
* threshold:
*
* Note that a complete matrix is a degenerate kind of partial
* matrix. The gradient descent computation still works in this
* situation, but is not as efficient or as accurate as an
* algebraic SVD solver for small matrices.
*
* There are theoretical requirements on the annealing schedule
* that guarantee convergence (up to arithmetic precision and no upper
* bound on the number of epochs):
*
* The previous discussion has introduced a daunting list of
* parameters required for gradient descent for singular value
* decomposition. Unfortunately, the stochastic gradient descent
* solver requires a fair amount of tuning to recover a low mean
* square error factorization. The optimal settings will also depend
* on the input matrix; for example, very sparse partial matrices are
* much easier to fit than dense matrices.
*
* Determining the order of the decomposition is mostly a matter
* of determining how many orders are needed for the amount of
* smoothing required. Low order reconstructions are useful for
* most applications. One way to determine maximum order is using
* held out data for an application. Another is to look for
* a point where the singular values become (relatively) insignificant.
*
*
* For low mean square error factorizations, many epochs may be
* necessary. Lowering the maximum number of epochs leads to what is
* known as early stopping. Sometimes, early stopping leads to
* more accurate held-out predictions, but it will always raise
* the factorization error (training data predictions). In general,
* the maximum number of epochs needs to be set empirically.
* Initially, try fewer epochs and gradually raise the number of
* epochs until the desired accuracy is achieved.
*
* Initial values of the singular vectors are not particularly
* sensitive, because we are using multiplicative updates. A good
* rule of thumb for starting values is the the inverse square root of
* the maximum order:
*
* A good starting point for the learning rate is 0.01. The
* annealing parameter should be set so that the learning rate is cut
* by at least a factor of 10 in the final rounds. This calls for a
* value that's roughly one tenth of the maximum number of epochs. If
* the initial learning rate is set higher, then the annealing
* schedule should be more agressive so that it spends a fair amount
* of time in the 0.001 to 0.0001 learning rate range.
*
*
* There are thorough Wikipedia articles on singular value decomposition
* and gradient descent, although the SVD entry focuses entirely on
* complete (non-partial) matrices:
*
* The following is a particularly clear explanation of many of
* the issues involved in the context of neural networks:
*
* Our partial SVD solver is based on C code from Timely
* Development (see license below). Timely based their code on Simon
* Funk's blog entry. Simon Funk's approach was itself based on his
* and Genevieve Gorrell's 2005 EACL paper.
*
* The singular value decomposition code is rather loosely based on
* a C program developed by Timely Development.
* Here is the copyright notice for the original code:
*
* See the class documentation above for more information
* on singular value decomposition.
*
* @param rowVectors Row vectors, indexed by row.
* @param columnVectors Column vectors, indexed by column.
* @param order Dimensionality of the row and column vectors.
*/
public SvdMatrix(double[][] rowVectors,
double[][] columnVectors,
int order) {
verifyDimensions("row",order,rowVectors);
verifyDimensions("column",order,columnVectors);
mRowVectors = rowVectors;
mColumnVectors = columnVectors;
mOrder = order;
}
/**
* Construct an SVD matrix using the specified singular vectors
* and singular values. The order of the factorization is equal to
* the length of the singular values. Each singular vector must
* be the same dimensionality as the array of singular values.
*
* See the class documentation above for more information
* on singular value decomposition.
*
* @param rowSingularVectors Row singular vectors, indexed by row.
* @param columnSingularVectors Column singular vectors, indexed by column.
* @param singularValues Array of singular values, in decreasing order.
*/
public SvdMatrix(double[][] rowSingularVectors,
double[][] columnSingularVectors,
double[] singularValues) {
mOrder = singularValues.length;
verifyDimensions("row",mOrder,rowSingularVectors);
verifyDimensions("column",mOrder,columnSingularVectors);
mRowVectors = new double[rowSingularVectors.length][mOrder];
mColumnVectors = new double[columnSingularVectors.length][mOrder];
double[] sqrtSingularValues = new double[singularValues.length];
for (int i = 0; i < sqrtSingularValues.length; ++i)
sqrtSingularValues[i] = Math.sqrt(singularValues[i]);
scale(mRowVectors,rowSingularVectors,sqrtSingularValues);
scale(mColumnVectors,columnSingularVectors,sqrtSingularValues);
}
/**
* Returns the number of rows in this matrix.
*
* @return The number of rows in this matrix.
*/
@Override
public int numRows() {
return mRowVectors.length;
}
/**
* Returns the number of columns in this matrix.
*
* @return The number of columns in this matrix.
*/
@Override
public int numColumns() {
return mColumnVectors.length;
}
/**
* Returns the order of this factorization. The order
* is the number of dimensions in the singular vectors
* and the singular values.
*
* @return The order of this decomposition.
*/
public int order() {
return mRowVectors[0].length;
}
/**
* Returns the value of this matrix at the specified row and column.
*
* @param row Row index.
* @param column Column index.
* @return The value of this matrix at the specified row and
* column.
*/
@Override
public double value(int row, int column) {
double[] rowVec = mRowVectors[row];
double[] colVec = mColumnVectors[column];
double result = 0.0;
for (int i = 0; i < rowVec.length; ++i)
result += rowVec[i] * colVec[i];
return result;
}
/**
* Returns the array of singular values for this decomposition.
*
* @return The singular values for this decomposition.
*/
public double[] singularValues() {
double[] singularValues = new double[mOrder];
for (int i = 0; i < singularValues.length; ++i)
singularValues[i] = singularValue(i);
return singularValues;
}
/**
* Returns the singular value for the specified order.
*
* @param order Dimension of the singular value.
* @return The singular value at the specified order.
*/
public double singularValue(int order) {
if (order >= mOrder) {
String msg = "Maximum order=" + (mOrder-1)
+ " found order=" + order;
throw new IllegalArgumentException(msg);
}
return columnLength(mRowVectors,order)
* columnLength(mColumnVectors,order);
}
/**
* Returns a matrix in which the left singular vectors make up the
* columns. The first index is for the row of the original matrix
* and the second index is for the order of the singular vector.
* Thus the returned matrix is For a full description of the arguments and values, see
* the method documentation for
* {@link #partialSvd(int[][],double[][],int,double,double,double,double,Reporter,double,int,int)}
* and the class documentation above.
*
* The two-dimensional array of values must be an
* This is now a utility method that calls {@link
* #svd(double[][],int,double,double,double,double,Reporter,double,int,int)}
* with a reporter wrapping the specified print writer at the
* debug level, or a silent print writer.
*
* @param values Array of values.
* @param maxOrder Maximum order of the decomposition.
* @param featureInit Initial value for singular vectors.
* @param initialLearningRate Incremental multiplier of error
* determining how fast learning occurs.
* @param annealingRate Rate at which annealing occurs; higher values
* provide more gradual annealing.
* @param regularization A regularization constant to damp learning.
* @param reporter Reporter to which to send progress and error
* reports.
* @param minImprovement Minimum relative improvement in mean
* square error required to finish an epoch.
* @param minEpochs Minimum number of epochs for training.
* @param maxEpochs Maximum number of epochs for training.
* training, or {@code null} if no output is desired.
* @return Singular value decomposition for the specified partial matrix
* at the specified order.
* @throws IllegalArgumentException Under conditions listed in the
* method documentation above.
*/
public static SvdMatrix svd(double[][] values,
int maxOrder,
double featureInit,
double initialLearningRate,
double annealingRate,
double regularization,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs) {
if (reporter == null)
reporter = Reporters.silent();
int m = values.length;
int n = values[0].length;
reporter.info("Calculating SVD");
reporter.info("#Rows=" + m + " #Cols=" + n);
// check all rows same length
for (int i = 1; i < m; ++i) {
if (values[i].length != n) {
String msg = "All rows must be of same length."
+ " Found row[0].length=" + n
+ " row[" + i + "]=" + values[i].length;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
}
// shared column Ids rows
int[] sharedRow = new int[n];
for (int j = 0; j < n; ++j)
sharedRow[j] = j;
int[][] columnIds = new int[m][];
for (int j = 0; j < m; ++j)
columnIds[j] = sharedRow;
return partialSvd(columnIds,
values,
maxOrder,
featureInit,
initialLearningRate,
annealingRate,
regularization,
reporter,
minImprovement,
minEpochs,
maxEpochs);
}
/**
* Return the singular value decomposition of the specified
* partial matrix, using the specified search parameters.
*
* The writer parameter may be set to allow incremental progress
* reports to that writer during training. These report on RMSE
* per epoch.
*
* See the class documentation above for a description of the
* algorithm.
*
* There are a number of constraints on the input, any
* violation of which will raise an illegal argument exception.
* The conditions are:
* U'
* and V'
, thus reducing the amount computation involved
* in returning a value (using column vectors as the default vector
* orientation):
*
*
* A[i][j] = U'[i]T * V'[j]
*
*
* Square Error and the Frobenius Norm
*
* Suppose A
and B
are
* m×n
matrices. The square error between them is
* defined by the so-called Frobenius norm, which extends the standard
* vector L2 or Euclidean norm to matrices:
*
*
*
* The square error is sometimes referred to as the residual sum
* of squares (RSS), because
* squareError(A,B)
* = frobeniusNorm(A-B)
* = Σi < m Σj < n (A[i][j] - B[i][j])2
A[i][j] - B[i][j]
* is the residual (difference between actual and predicted value).
*
* Lower-order Approximations
*
* Consider factoring a matrix A
of dimension
* m×n
into the product of two matrices
* X * YT
, where X
is of dimension
* m×k
and Y
is of dimension
* n×k
. We may then measure the square
* error squareError(A,X * Y)
to determine how well
* the factorization matches A
. We know that if
* we take U', V'
from the singular value
* decomposition that:
*
*
*
* Here
* squareError(A, U' * V'T) = 0.0
U'
and V'
have a number of columns
* (called factors) equal to the rank of the matrix A
.
* The singular value decomposition is such that the first
* k
columns of U'
and V'
* provide the best order q
approximation of
* A
of any X
and Y
* of dimensions m×q
and n×q
* In symbols:
*
*
*
* where
* U'q, V'q = argmin X is m×q, Y is n×q squareError(A, X * YT)
U'q
is the restriction of U'
* to its first q
columns.
*
*
*
*
* meanSquareError(A,B) = squareError(A,B)/(m×n)
*
*
*
* rootMeanSquareError(A,B) = sqrt(meanSquareError(A,B))
Partial Matrices
*
* A partial matrix is one in which some of the values are unknown.
* This is in contrast with a sparse matrix, in which most of the
* values are zero. A variant of singular value decomposition may be
* used to impute the unknown values by minimizing square error with
* respect to the known values only. Unknown values are then simply
* derived from the factorization U' * V'T
.
* Typically, the approximation is of lower order than the rank of
* the matrix.
*
* Regularized SVD via Shrinkage
*
* Linear regression techniques such as SVD often overfit their
* training data in order to derive exact answers. This problem
* is mitigated somewhat by choosing low-order approximations to
* the full-rank SVD. Another option is to penalize large values
* in the singular vectors, thus favoring smaller values. The
* most common way to do this because of its practicality is via
* parameter shrinkage.
*
*
*
* where the parameter costs for a vector
* regularizedError(A, U' * V')
* = squareError(A, U' * V')
* + parameterCost(U') + parameterCost(V')
X
of
* dimensionality q
is the sum of the squared
* parameters:
*
*
*
* Note that the hyperparameter
* parameterCost(X)
* = λ * Σi < q X[i]2
* = λ * length(X)2
λ
scales the
* parameter cost term in the overall error.
*
* λ
. The resulting minimizer of regularized
* error is the maximum a posteriori (MAP) solution. The Bayesian
* approach also allows covariances to be estimated (including simple
* parameter variance estimates), but these are not implemented in
* this class.
*
*
* Regularized Stochastic Gradient Descent
*
* Singular value decomposition may be computed "exactly"
* (modulo arithmetic precision and convergence) using an algorithm
* whose time complexity is O(m3 +
* n3)
for an m×n
matrix (equal
* to O(max(m,n)3)
. Arithmetic precision is
* especially vexing at singular values near zero and with highly
* correlated rows or columns in the input matrix.
*
* k
* approximation and the current order k+1
values) and
* then move the vectors to reduce error. We use U'
and
* V'
for the incremental values of the singular vectors
* scaled by the singular values:
*
*
*
* where
* for (k = 0; k < maxOrder; ++k)
* for (i < m) U'[i][k] = random.nextGaussian()*initValue;
* for (j < n) V'[j][k] = random.nextGaussian()*initValue;
* for (epoch = 0; epoch < maxEpochs && not converged; ++epoch)
* for (i,j such that M[i][j] defined)
* error = M[i][j] - U'k[i] * V'k[j] // * is vector dot product
* uTemp = U'[i][k]
* vTemp = V'[j][k]
* U'[i][k] += learningRate[epoch] * ( error * vTemp - regularization * uTemp )
* V'[j][k] += learningRate[epoch] * ( error * uTemp - regularization * vTemp )
initValue
, maxEpochs
,
* learningRate
(see below), and
* regularization
are algorithm hyperparameters. Note
* that the initial values of the singular vectors are set randomly to
* the result of a draw from a Gaussian (normal) distribution with
* mean 0.0 and variance 1.0.
*
* k-1
factors is cached to reduce
* time in the inner loop. This requires a double-length floating
* point value for each defined entry in the matrix.
*
* Gradient Interpretation
*
* U'[i]
and U'[j]
in the
* direction of the gradient of the error. The gradient is the
* multivariate derivative of the objective function being minimized.
* Because our object is squared error, the gradient is just its
* derivative, which is just (two times) the (linear) error itself. We
* roll the factor of 2 into the learning rate to derive the update in
* the algorithm pseudo-code above.
*
* (error * vTemp)
is the component of the
* error gradient due to the current value of V'[i][k]
* and the term (regularization * uTemp)
is the component of the
* gradient to the size of the parameter U'[i][k]
. The
* updates thus move the parameter vectors in the direction of
* the gradient.
*
* Convergence Conditions
*
*
*
* When the relative difference in square error is less than
* a hyperparameter threshold
* regError(epoch) = regError(M,U'k(epoch) * V'k(epoch)T)
* relativeImprovement(epoch+1) = relativeDiff(regError(epoch+1), regError(epoch))
* relativeDiff(x,y) = abs(x-y)/(abs(x) + abs(y))
minImprovement
, the
* epoch terminates and the algorithm moves on to the next
* factor k+1
.
*
* Annealing Schedule
*
* Learning rates that are too high are unstable, whereas learning rates
* that are too low never reach their targets. To get around this
* problem, the learning rate, learningRate[epoch]
, is
* lowered as the number of epochs increase. This lowering of the
* learning rate has a thermodynamic interpretation in terms of free
* energy, hence the name "annealing". Larger moves are
* made in earlier epochs, then the temperature is gradually lowered
* so that the learner may settle into a stable fixed point. The
* function learningRate[epoch]
is called the annealing
* schedule.
*
*
*
* The schedule we use is the one commonly chosen to meet the
* above requirements:
*
*
* Σepoch learningRate[epoch] = infinity
* Σepoch learningRate[epoch]2 < infinity
*
* where
* learningRate[epoch] = initialLearningRate / (1 + epoch/annealingRate)
initialLearningRate
is an initial learning rate and
* annealingFactor
determines how quickly it shrinks.
* The learning rate moves from its initial size
* (initialLearningRate
) to one half (1/2
) of its
* original size after annealingRate
epochs, and
* moves from its initial size to one tenth (1/10
) of
* its initial size after 9 * annealingRate
epochs,
* and one hundredth of its initial size after
* 99 * annealingRate
epochs..
*
* Parameter Choices
*
* Maximum Order
*
* Maximum Epochs and Early Stopping
*
* Minimum Improvement
*
* By setting the minimum improvement to 0.0, the algorithm is forced
* to use the maximum number of epochs. By setting it above this
* level, a form of early stopping is achieved when the benefit of
* another epoch of refitting falls below a given improvement
* threshold. This value may also be set on an application basis
* using held out data, or it may be fit to achieve a given level of
* mean square error on the training (input) data. The minimum
* improvement needs to be set fairly low (less than 0.000001) to
* achieve reasonably precise factorizations. Note that minimum
* improvement is defined relatively, as noted above.
*
* Initial Parameter Values
*
*
*
*
* initVal ~ 1 / sqrt(maxOrder)
Learning Rate, Maximum Epochs and Annealing
*
* References
*
*
*
*
*
*
*
*
*
* Acknowledgements
*
*
*
* @author Bob Carpenter
* @version 4.0.0
* @since LingPipe3.2
*/
public class SvdMatrix extends AbstractMatrix {
private final double[][] mRowVectors;
private final double[][] mColumnVectors;
private final int mOrder;
/**
* Construct a factored matrix using the specified row and column
* vectors at the specified order. Each vector in the arrays of
* row and column vectors must be of the same dimension as the
* order.
*
*
* // SVD Sample Code
* //
* // Copyright (C) 2007 Timely Development (www.timelydevelopment.com)
* //
* // Special thanks to Simon Funk and others from the Netflix Prize contest
* // for providing pseudo-code and tuning hints.
* //
* // Feel free to use this code as you wish as long as you include
* // these notices and attribution.
* //
* // Also, if you have alternative types of algorithms for accomplishing
* // the same goal and would like to contribute, please share them as well :)
* //
* // STANDARD DISCLAIMER:
* //
* // - THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY
* // - OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
* // - LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR
* // - FITNESS FOR A PARTICULAR PURPOSE.
*
m×k
, if the
* original input was an m×n
matrix and SVD was
* performed at order k
.
* @return The left singular vectors of this matrix.
*/
public double[][] leftSingularVectors() {
return normalizeColumns(mRowVectors);
}
/**
* Returns a matrix in which the right singular vectors make up
* the columns. The first index is for the column of the original
* matrix and the second index is for the order of the singular
* vector. Thus the returned matrix is n×k
, if
* the original input was an m×n
matrix and SVD
* was performed at order k
.
*
* @return The right singular vectors.
*/
public double[][] rightSingularVectors() {
return normalizeColumns(mColumnVectors);
}
/**
* Returns the signular value decomposition of the specified
* complete matrix of values.
*
* m × n
matrix. In particular, each row
* must be of the same length. If this is not the case, an illegal
* argument exception will be raised.
*
*
*
*
*
*
* @param columnIds Identifiers of column index for given row and entry.
* @param values Values at row and column index for given entry.
* @param maxOrder Maximum order of the decomposition.
* @param featureInit Initial value for singular vectors.
* @param initialLearningRate Incremental multiplier of error determining how
* fast learning occurs.
* @param annealingRate Rate at which annealing occurs; higher values
* provide more gradual annealing.
* @param regularization A regularization constant to damp learning.
* @param reporter Reporter to which progress reports are written, or
* {@code null} if no reporting is required.
* @param minImprovement Minimum relative improvement in mean square error required
* to finish an epoch.
* @param minEpochs Minimum number of epochs for training.
* @param maxEpochs Maximum number of epochs for training.
* @return Singular value decomposition for the specified partial matrix
* at the specified order.
* @throws IllegalArgumentException Under conditions listed in the
* method documentation above.
*/
public static SvdMatrix partialSvd(int[][] columnIds,
double[][] values,
int maxOrder,
double featureInit,
double initialLearningRate,
double annealingRate,
double regularization,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs) {
return partialSvd(columnIds,values,
maxOrder,
featureInit,initialLearningRate,annealingRate,regularization,
new Random(),
reporter,
minImprovement,minEpochs,maxEpochs);
}
/**
* This method is identical to the other singular-value decomposition
* method, only with a specified randomizer.
*
* @param columnIds Identifiers of column index for given row and entry.
* @param values Values at row and column index for given entry.
* @param maxOrder Maximum order of the decomposition.
* @param featureInit Initial value for singular vectors.
* @param initialLearningRate Incremental multiplier of error determining how
* fast learning occurs.
* @param annealingRate Rate at which annealing occurs; higher values
* provide more gradual annealing.
* @param random Randomizer to use for initialization.
* @param reporter Reporter to which progress reports are written, or {@code null}
* for no reporting.
* @param regularization A regularization constant to damp learning.
* @param minImprovement Minimum relative improvement in mean square error required
* to finish an epoch.
* @param minEpochs Minimum number of epochs for training.
* @param maxEpochs Maximum number of epochs for training.
* @return Singular value decomposition for the specified partial matrix
* at the specified order.
* @throws IllegalArgumentException Under conditions listed in the
* method documentation above.
*/
static SvdMatrix partialSvd(int[][] columnIds,
double[][] values,
int maxOrder,
double featureInit,
double initialLearningRate,
double annealingRate,
double regularization,
Random random,
Reporter reporter,
double minImprovement,
int minEpochs,
int maxEpochs) {
if (reporter == null)
reporter = Reporters.silent();
reporter.info("Start");
if (maxOrder < 1) {
String msg = "Max order must be >= 1."
+ " Found maxOrder=" + maxOrder;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (minImprovement < 0 || notFinite(minImprovement)) {
String msg = "Min improvement must be finite and non-negative."
+ " Found minImprovement=" + minImprovement;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (minEpochs <= 0 || maxEpochs < minEpochs) {
String msg = "Min epochs must be non-negative and less than or equal to max epochs."
+ " found minEpochs=" + minEpochs
+ " maxEpochs=" + maxEpochs;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (notFinite(featureInit) || featureInit == 0.0) {
String msg = "Feature inits must be finite and non-zero."
+ " Found featureInit=" + featureInit;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (notFinite(initialLearningRate) || initialLearningRate < 0) {
String msg = "Initial learning rate must be finite and non-negative."
+ " Found initialLearningRate=" + initialLearningRate;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (notFinite(regularization) || regularization < 0) {
String msg = "Regularization must be finite and non-negative."
+ " Found regularization=" + regularization;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
for (int row = 0; row < columnIds.length; ++row) {
if (columnIds == null) {
String msg = "ColumnIds must not be null.";
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (values == null) {
String msg = "Values must not be null";
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (columnIds[row] == null) {
String msg = "All column Ids must be non-null."
+ " Found null in row=" + row;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (values[row] == null) {
String msg = "All values must be non-null."
+ " Found null row=" + row;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (columnIds[row].length != values[row].length) {
String msg = "column Ids and values must be same length."
+ " For row=" + row
+ " Found columnIds[row].length=" + columnIds[row].length
+ " Found values[row].length=" + values[row].length;
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
for (int i = 0; i < columnIds[row].length; ++i) {
if (columnIds[row][i] < 0) {
String msg = "Column ids must be non-negative."
+ " Found columnIds[" + row + "][" + i + "]=" + columnIds[row][i];
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
if (i > 0 && columnIds[row][i-1] >= columnIds[row][i]) {
String msg = "All column Ids must be same length."
+ " At row=" + row
+ " Mismatch at rows " + i + " and " + (i-1);
reporter.fatal(msg);
throw new IllegalArgumentException(msg);
}
}
}
if (annealingRate < 0 || notFinite(annealingRate)) {
String msg = "Annealing rate must be finite and non-negative."
+ " Found rate=" + annealingRate;
reporter.fatal(msg);
throw new IllegalArgumentException("14");
}
int numRows = columnIds.length;
int numEntries = 0;
for (double[] xs : values)
numEntries += xs.length;
int maxColumnIndex = 0;
for (int[] xs : columnIds)
for (int i = 0; i < xs.length; ++i)
if (xs[i] > maxColumnIndex)
maxColumnIndex = xs[i];
int numColumns = maxColumnIndex + 1;
maxOrder = Math.min(maxOrder,Math.min(numRows,numColumns));
double[][] cache = new double[values.length][];
for (int row = 0; row < numRows; ++row) {
cache[row] = new double[values[row].length];
Arrays.fill(cache[row],0.0F);
}
List