com.aliasi.stats.RegressionPrior Maven / Gradle / Ivy
Show all versions of aliasi-lingpipe Show documentation
/*
* LingPipe v. 4.1.0
* Copyright (C) 2003-2011 Alias-i
*
* This program is licensed under the Alias-i Royalty Free License
* Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Alias-i
* Royalty Free License Version 1 for more details.
*
* You should have received a copy of the Alias-i Royalty Free License
* Version 1 along with this program; if not, visit
* http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
* Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
* +1 (718) 290-9170.
*/
package com.aliasi.stats;
import com.aliasi.matrix.Vector;
import com.aliasi.util.AbstractExternalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.io.Serializable;
/**
* A RegressionPrior
instance represents a prior
* distribution on parameters for linear or logistic regression.
* It has methods to return the log probabilities of input
* parameters and compute the gradient of the log probability
* for estimation.
*
* Instances of this class are used as parameters in the {@link
* LogisticRegression} class to control the regularization or lack
* thereof used by the stochastic gradient descent optimizers. The
* priors typically assume a zero mode (maximal value) for each
* dimension, but allow variances (or scales) to vary by input
* dimension. The method {@link #shiftMeans(double[],RegressionPrior)}
* may be used to shift the means (and hence modes) of priors.
*
*
The behavior of a prior under stochastic gradient fitting is
* determined by its gradient, the partial derivatives with respect to
* the dimensions of the error function for the prior (negative log
* likelihood) with respect to a coefficient
* βi
.
*
*
* gradient(β,i) = - ∂ log p(β) / ∂ βi
*
* See the class documentation for {@link LogisticRegression}
* for more information.
*
*
Priors also implement a log (base 2) probability density for the
* prior for a given parameter in a given dimension. The total log
* prior probability is defined as the sum of the log probabilities
* for the dimensions,
*
*
* log p(β) = Σi log p(βi)
*
* Priors affect gradient descent fitting of regression through
* their contribution to the gradient of the error function with
* respect to the parameter vector. The contribution of the prior to
* the error function is the negative log probability of the parameter
* vector(s) with respect to the prior distribution. The gradient of
* the error function is the collection of partial derivatives of the
* error function with respect to the components of the parameter
* vector. The regression prior abstract base class is defined in
* terms of a single method {@link #gradient(double,int)}, which
* specifies the value of the gradient of the error function for a
* specified dimension with a specified value in that dimension.
*
*
This class implements static factory methods to construct
* noninformative, Gaussian and Laplace priors. The Gaussian and
* Laplace priors may specify a different variance for each dimension,
* but assumes all the prior means (which are equivalent to the modes)
* are zero. The priors also assume the dimensions are independent so
* that the full covariance matrix is assumed to be diagonal (that is,
* there is zero covariance between different dimensions).
*
*
*
Noninformative Prior & Maximum Likelihood Estimation
*
* Using a noninformative prior for regression results in standard
* maximum likelihood estimation.
*
*
The noninformative prior assumes an improper uniform
* distribution over parameter vectors:
*
*
* p(βi) = Uniform(βi) = constant
*
* and thus the log probabiilty is constant
*
*
* log p(βi) = log constant
*
* and therefore contributes nothing to the gradient:
*
*
* gradient(β,i) = 0.0
*
* A noninformative prior is constructed using the static method
* {@link #noninformative()}.
*
*
* Gaussian Prior, L2 Regularization & Ridge Regression
*
* The Gaussian prior assumes a Gaussian (also known as normal) density over
* parameter vectors which results in L2-regularized
* regression, also known as ridge regression. Specifically, the
* prior allows a variance to be specified per dimension, but
* assumes dimensions are independent in that all off-diagonal
* covariances are zero. The Gaussian prior has a single mode that
* is the same as its mean.
*
*
The Gaussian density with variance is defined by:
*
*
* p(βi) = 1.0/sqrt(2 * π σi2) * exp(-βi2/(2 * σi2))
*
* which on a log scale is
*
*
* log p(βi) = log (1.0/sqrt(2 * π * σi2)) + -βi2/(2 * σi2)
*
* The Gaussian prior leads to the following contribution to the
* gradient for a dimension i
with parameter
* βi
and variance
* σi2
:
*
*
* gradient(β,i) = βi/σi2
*
* As usual, the lower the variance, the steeper the gradient, and the stronger
* the effect on the (maximum) a posteriori estimate.
*
* Gaussian priors are constructed using one of the static factory
* methods, {@link #gaussian(double[])} or {@link
* #gaussian(double,boolean)}.
*
*
Laplace Prior, L1 Regularization & the Lasso
*
* The Laplace prior assumes a Laplace density over parameter
* vectors which results in L1-regularized regression, also
* known as the lasso. The Laplace prior is called a
* double-exponential distribution because it is looks like an
* exponential distribution for positive values and the reflection of
* this exponential distribution around zero (or more generally,
* around its mean parameter). The Laplace prior has the mode in
* the same location as the mean.
*
*
A Laplace prior allows a variance to be specified per dimension,
* but like the Gaussian prior, assumes means are zero and that the
* dimensions are independent in that all off-diagonal covariances are
* zero.
*
*
The Laplace density is defined by:
*
*
* p(βi) = (sqrt(2)/(2 * σi)) * exp(- sqrt(2) * abs(βi) / σi)
*
* which on the log scale is
*
*
* log p(βi) = log (sqrt(2)/(2 * σi)) - sqrt(2) * abs(βi) / σi
* The Laplace prior leads to the following contribution to the
* gradient for a dimension i
with parameter
* betai
, mean zero and variance
* σi2
:
*
*
* * where the derivative of the absolute value function is the* gradient(β,i) = sqrt(2) * signum(βi) / σi
signum
function, as defined by {@link Math#signum(double)}.
*
* * ** signum(x) = x > 0 ? 1 : (x < 0 ? -1 : 0)
Laplace priors are constructed using one of the static factory * methods, {@link #laplace(double[])} or {@link * #laplace(double,boolean)}. * * *
Cauchy Prior
* *The Cauchy prior assumes a Cauchy density (also known as a * Lorentz density) over priors. The Cauchy density allows a scale to * be specified for each dimension. The mean and variance are * undefined as their integrals diverge. The Cauchy distribution is * symmetric and for regression priors, we assume a mode of zero for * the base distribution. The Cauchy prior also has a single mode * at its mean. * *
The Cauchy density with scale of 1 * is a Student-t density with one degree of freedom. * *
The Cauchy density is defined by: * *
* which on a log scale is * ** p(βi,i) = (1 / π) * (λi / (βi2 + λi2))
* ** log p(βi,i) = log (1 / π) + log (λi) - log (βi2 + λi2)
The Cauchy prior leads to the following contribution to the
* gradient for dimension i
with parameter βi
and scale
* λi2
:
*
*
* ** gradient(βi, i) = 2 βi / (βi2 + λi2)
Cauchy priors are constructed using one of the static factory * methods {@link #cauchy(double[])} or {@link #cauchy(double,boolean)}. * *
Log Interpolated Priors
* *For use in gradient-based algorithms, the gradients of two
* different priors may be interpolated. A special case is the
* elastic net, discussed in he next section. Given two priors
* p1
and p2
, and an interpolation ratio
* α
between 0 and 1, the interpolated prior is
* defined by
*
*
* * where* log p(βi) = α * log p1(βi) + (1 - α) * log p2(βi) - Z
Z
is the normalization constant not depending on
* β
that normalizes the density,
*
* * ** p(β,i) = exp(log p(βi)) * * = exp(α * log p1(βi)) * exp((1 - α) * log p2(βi)) / exp(Z) * * = p1(β,i)α * p2(β,1)(1 - α) / exp(Z)
The gradient, being a derivative, will be the weighted sum of the
* underlying gradients gradient1
and gradient2
,
*
*
* * ** gradient(β,i) = α * gradient1(β,i) + (1 - α) * gradient2(β,i)
Elastic Net Prior
* * The elastic net prior interpolates between a Laplace prior and a * Gaussian prior on the log scale uniformly for all dimensions. * * There are two parameters, a scale parameter for the prior variances * and an interpolation parameter that determines the weight given to * the Laplace prior versus the Gaussian prior. The elastic net prior * with Laplace weightα
and scale
* λ
is defined by
*
* * * where* log p(β,i) = α * log Laplace(βi|1/sqrt(λ)) + (1 - α) Gaussian(βi|sqrt(2)/λ)
Laplace(βi|1/sqrt(λ))
is
* the density of the (zero-mean) Laplace distribution with variance
* 1/sqrt(λ)
, and
* Gaussian(βi|sqrt(2)/λ)
is the
* (zero-mean) Gaussian density function with variance
* sqrt(2)/λ
.
+ (1 - α) Gaussian(βi|sqrt(2)/λ)
*
* Thus the gradient is an interpolation of the gradients of the
* Laplace with variance σ2 = 1/sqrt(λ)
and
* Gaussian with variance σ2 = sqrt(2)/λ
,
* leading to a simple gradient form,
*
*
* ** gradient(β,i) = α * λ * signum(βi) + (1 - α) * λ * βi
The basic elastic net prior has zero means and modes in all' * dimensions, but may be shifted like other priors. * *
Non-Zero Means and Modes
* *Priors with non-zero means or modes typically arise in * hierarchical or multilevel regression models or models in which * infomative priors are available on a dimension-by-dimension basis. * *
Through the method {@link #shiftMeans(double[],RegressionPrior)}
* it is possible to shift the means of a prior by the specified
* amount. This allows any prior to be used with non-zero means.
* Probabilities are computed by shifting back. Suppose
* p2
is the density and gradient2
the
* gradient of the specified prior and shifts
the
* specified array of floats specifying the mean shifts.
* Probabilities and gradients are computed by shifting back,
*
*
* * and * ** p(β) = p2(β - shifts)
* * Dimension by dimension, the value is computed by subtracting * the shift from the value and plugging it into the underlying * prior. * ** gradient(β,i) = gradient2(β - shifts,i)
For example, to specify a Gaussian prior with means
* mus
and variances vars
, use
*
*
* * ** double[] mus = ... * double[] vars = ... * RegressionPrior prior = shiftMeans(mus,gaussian(vars))
Special Treatment of Intercept
* *By convention, input dimension zero (0
) may be
* reserved for the intercept and set to value 1.0 in all input
* vectors. For regularized regression, the regularization is
* typically not applied to the intercept term. To match this
* convention, the factory methods allow a boolean parameter
* indicating whether the intercept parameter has a
* noninformative/uniform prior. If the intercept flag indicates it
* is noninformative, then dimension 0 will not have an infinite
* prior variance or scale, and hence a zero gradient. The result is
* that the intercept will be fit by maximum likelihood.
*
*
Serialization
* *All of the regression priors may be serialized. *
References
* *For full details on the Gaussian, cauchy, and Laplace distributions, * see: * *
-
*
- Wikipedia: Normal (Gaussian) Distribution *
- Wikipedia: Laplace (Double Exponential) Distribution *
- Wikipedia: Cauchy Distribution *
For explanations of how the priors are used with regression * including logistic regression, see the following three textbooks: * *
-
*
- Hill, Jennifer and Andrew Gelman. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. *
- Hastie, Trevor, Tibshirani, Robert and Jerome Friedman. 2001. * Elements of Statistical Learning. * Springer. * *
- Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. * Springer. *
-
*
- Genkin, Alexander, David D. Lewis, and David Madigan. 2004. * Large-Scale Bayesian Logistic Regression for Text Categorization. * Rutgers University Technical Report. * (alternate download). *
- * Carpenter, Bob. 2008. Lazy Sparse Stochastic Gradient Descent for Regularized Multinomial Logistic Regression. Technical Report. Alias-i. *
-
*
- Gelman, Andrew, Aleks Jakulin, Yu-Sung Su, and Maria Grazia Pittau. 2007. A Default Prior Distribution for Logistic and Other Regression Models. * *
For details of the elastic net prior, see * *
-
*
- * Zou and Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B. *
- * Friedman, Hastie and Tibshirani. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33:1. *
If the noninformative-intercept flag is set to
* true
, the prior variance for dimension zero
* (0
) is set to {@link Double#POSITIVE_INFINITY}.
*
*
See the class documentation above for more inforamtion on * Gaussian priors. * * @param priorVariance Variance of the Gaussian prior for each * dimension. * @param noninformativeIntercept Flag indicating if intercept is * given a noninformative (uniform) prior. * @return The Gaussian prior with the specified parameters. * @throws IllegalArgumentException If the prior variance is not * a non-negative number. */ public static RegressionPrior gaussian(double priorVariance, boolean noninformativeIntercept) { verifyPriorVariance(priorVariance); return new VariableGaussianRegressionPrior(priorVariance,noninformativeIntercept); } /** * Returns the Gaussian prior with the specified priors for * each dimension. The number of dimensions is taken to be * the length of the variance array. * *
See the class documentation above for more inforamtion on * Gaussian priors. * * @param priorVariances Array of prior variances for dimensions. * @return The Gaussian prior with the specified variances. * @throws IllegalArgumentException If any of the variances are not * non-negative numbers. * */ public static RegressionPrior gaussian(double[] priorVariances) { verifyPriorVariances(priorVariances); return new GaussianRegressionPrior(priorVariances); } /** * Returns the Laplace prior with the specified prior variance * and number of dimensions and indication of whether the * intecept dimension is given a noninformative prior. * *
If the noninformative-intercept flag is set to
* true
, the prior variance for dimension zero
* (0
) is set to {@link Double#POSITIVE_INFINITY}.
*
*
See the class documentation above for more inforamtion on * Laplace priors. * * @param priorVariance Variance of the Laplace prior for each * dimension. * @param noninformativeIntercept Flag indicating if intercept is * given a noninformative (uniform) prior. * @return The Laplace prior with the specified parameters. * @throws IllegalArgumentException If the variance is not a non-negative * number. */ public static RegressionPrior laplace(double priorVariance, boolean noninformativeIntercept) { verifyPriorVariance(priorVariance); return new VariableLaplaceRegressionPrior(priorVariance,noninformativeIntercept); } /** * Returns the Laplace prior with the specified prior variances * for the dimensions. * *
See the class documentation above for more inforamtion on * Laplace priors. * * @param priorVariances Array of prior variances for dimensions. * @return The Laplace prior for the specified variances. * @throws IllegalArgumentException If any of the variances is not * a non-negative number. */ public static RegressionPrior laplace(double[] priorVariances) { verifyPriorVariances(priorVariances); return new LaplaceRegressionPrior(priorVariances); } /** * Returns the Cauchy prior with the specified prior squared * scales for the dimensions. * *
See the class documentation above for more information * on Cauchy priors. * * @param priorSquaredScale The square of the prior scae parameter. * @param noninformativeIntercept Flag indicating if intercept is * given a noninformative (uniform) prior. * @return The Cauchy prior for the specified squared scale and * intercept flag. * @throws IllegalArgumentException If the scale is not a non-negative * number. */ public static RegressionPrior cauchy(double priorSquaredScale, boolean noninformativeIntercept) { verifyPriorVariance(priorSquaredScale); return new VariableCauchyRegressionPrior(priorSquaredScale,noninformativeIntercept); } /** * Returns the Cauchy prior for the specified squared scales. * *
See the class documentation above for more information * on Cauchy priors. * * @param priorSquaredScales Prior squared scale parameters. * @return The Cauchy prior for the specified square scales. * @throws IllegalArgumentException If any of the prior squared * scales is not a non-negative number. */ public static RegressionPrior cauchy(double[] priorSquaredScales) { verifyPriorVariances(priorSquaredScales); return new CauchyRegressionPrior(priorSquaredScales); } /** * Returns the prior that interpolates its log probability between * the specified priors with the weight going to the first prior. * *
See the class documentaton above for more information on * Cauchy priors. * * @param alpha Weight of first prior. * @param prior1 First prior for interpolation. * @param prior2 Second prior for interpolation. * @return The interpolated prior. * @throws IllegalArgumentException If the interpolation ratio is * not a number between 0 and 1 inclusive. */ public static RegressionPrior logInterpolated(double alpha, RegressionPrior prior1, RegressionPrior prior2) { if (Double.isNaN(alpha) || alpha < 0.0 || alpha > 1.0) { String msg = "Weight of first prior must be between 0 and 1 inclusive." + " Found alpha=" + alpha; throw new IllegalArgumentException(msg); } return new LogInterpolatedRegressionPrior(alpha,prior1,prior2); } /** * Returns the elastic net prior with the specified weight on * the Laplace prior, the specified scale parameter for the elastic * net and a noninformative prior on the intercept (dimension 0) if * the specified flag is set. * *
See the class documentation above for more information on * elastic net priors. * *
This is a convenience method for * *
* * @param laplaceWeight Weight on the Laplace prior. * @param scale Scale parameter for the elastic net. * @param noninformativeIntercept A flag indicating whether or not * the intercept (dimension 0) should have a noninformative prior. * @return The elastic net prior with the specified paramters. * @throws IllegalArgumentException If the interpolation parameter * is not between 0 and 1 inclusive, and if the scale is not * positive and finite. */ public static RegressionPrior elasticNet(double laplaceWeight, double scale, boolean noninformativeIntercept) { if (Double.isInfinite(scale) || !(scale > 0.0)) { String msg = "Scale parameter must be finite and positive." + " Found scale=" + scale; throw new IllegalArgumentException(msg); } return logInterpolated(laplaceWeight, laplace(1.0/Math.sqrt(scale),noninformativeIntercept), gaussian(SQRT_2/scale,noninformativeIntercept)); } /** * Returns the prior that shifts the means of the specified prior * by the specified values. * ** logInterpolated(laplaceWeight, * laplace(1/sqrt(scale),noninformativeIntercept), * gaussian(sqrt(2)/scale,noninformativeIntercept)) *
See the class documentation above for more information.
*
* @param shifts Mean shifts indexed by dimension.
* @param prior Prior to apply to shifted values.
* @return Prior that shifts values before delegating to the
* specified prior.
*/
public static RegressionPrior shiftMeans(double[] shifts,
RegressionPrior prior) {
return new ShiftMeans(shifts,prior);
}
static void verifyPriorVariance(double priorVariance) {
if (priorVariance < 0
|| Double.isNaN(priorVariance)
|| priorVariance == Double.NEGATIVE_INFINITY) {
String msg = "Prior variance must be a non-negative number."
+ " Found priorVariance=" + priorVariance;
throw new IllegalArgumentException(msg);
}
}
static void verifyPriorVariances(double[] priorVariances) {
for (int i = 0; i < priorVariances.length; ++i) {
if (priorVariances[i] < 0
|| Double.isNaN(priorVariances[i])
|| priorVariances[i] == Double.NEGATIVE_INFINITY) {
String msg = "Prior variances must be non-negative numbers."
+ " Found priorVariances[" + i + "]=" + priorVariances[i];
throw new IllegalArgumentException(msg);
}
}
}
static class NoninformativeRegressionPrior
extends RegressionPrior
implements Serializable {
static final long serialVersionUID = -582012445093979284L;
@Override
public double gradient(double beta, int dimension) {
return 0.0;
}
@Override
public double log2Prior(double beta, int dimension) {
return 0.0; // log2(1) = 0
}
@Override
public double log2Prior(Vector beta) {
return 0.0;
}
@Override
public double log2Prior(Vector[] betas) {
return 0.0;
}
@Override
public String toString() {
return "NoninformativeRegressionPrior";
}
@Override
public boolean isUniform() {
return true;
}
}
static abstract class ArrayRegressionPrior extends RegressionPrior {
static final long serialVersionUID = -1887383164794837169L;
final double[] mValues;
ArrayRegressionPrior(double[] values) {
mValues = values;
}
@Override
void verifyNumberOfDimensions(int numDimensions) {
if (mValues.length != numDimensions) {
String msg = "Prior and instances must match in number of dimensions."
+ " Found prior numDimensions=" + mValues.length
+ " instance numDimensions=" + numDimensions;
throw new IllegalArgumentException(msg);
}
}
public String toString(String priorName, String paramName) {
StringBuilder sb = new StringBuilder();
sb.append(priorName + "\n");
sb.append(" dimensionality=" + mValues.length);
for (int i = 0; i < mValues.length; ++i)
sb.append(" " + paramName + "[" + i + "]=" + mValues[i] + "\n");
return sb.toString();
}
}
static class GaussianRegressionPrior
extends ArrayRegressionPrior
implements Serializable {
static final long serialVersionUID = 8257747607648390037L;
GaussianRegressionPrior(double[] priorVariances) {
super(priorVariances);
}
@Override
public double gradient(double beta, int dimension) {
return beta / mValues[dimension];
}
@Override
public double log2Prior(double beta, int dimension) {
return -log2Sqrt2Pi
- 0.5 * com.aliasi.util.Math.log2(mValues[dimension])
- beta * beta / (2.0 * mValues[dimension]);
}
@Override
public String toString() {
return toString("GaussianRegressionPrior","Variance");
}
private Object writeReplace() {
return new Serializer(this);
}
private static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = -1129377549371296060L;
final GaussianRegressionPrior mPrior;
public Serializer(GaussianRegressionPrior prior) {
mPrior = prior;
}
public Serializer() {
this(null);
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeInt(mPrior.mValues.length);
for (int i = 0; i < mPrior.mValues.length; ++i)
out.writeDouble(mPrior.mValues[i]);
}
@Override
public Object read(ObjectInput in) throws IOException, ClassNotFoundException {
int numDimensions = in.readInt();
double[] priorVariances = new double[numDimensions];
for (int i = 0; i < numDimensions; ++i)
priorVariances[i] = in.readDouble();
return new GaussianRegressionPrior(priorVariances);
}
}
}
static final double sqrt2 = Math.sqrt(2.0);
static final double log2Sqrt2Over2 = com.aliasi.util.Math.log2(sqrt2/2.0);
static final double log2Sqrt2Pi
= com.aliasi.util.Math.log2(Math.sqrt(2.0 * Math.PI));
static final double log21OverPi = -com.aliasi.util.Math.log2(Math.PI);
static class LaplaceRegressionPrior
extends ArrayRegressionPrior
implements Serializable {
static final long serialVersionUID = 9120480132502062861L;
LaplaceRegressionPrior(double[] priorVariances) {
super(priorVariances);
}
@Override
public double gradient(double beta, int dimension) {
if (beta == 0.0) return 0.0;
if (beta > 0)
return Math.sqrt(2.0/mValues[dimension]);
return -Math.sqrt(2.0/mValues[dimension]);
}
@Override
public double log2Prior(double beta, int dimension) {
return log2Sqrt2Over2
- 0.5 * com.aliasi.util.Math.log2(mValues[dimension])
- sqrt2 * Math.abs(beta) / Math.sqrt(mValues[dimension]);
}
@Override
public String toString() {
return toString("LaplaceRegressionPrior","Variance");
}
private Object writeReplace() {
return new Serializer(this);
}
private static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = 7844951573062416091L;
final LaplaceRegressionPrior mPrior;
public Serializer(LaplaceRegressionPrior prior) {
mPrior = prior;
}
public Serializer() {
this(null);
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeInt(mPrior.mValues.length);
for (int i = 0; i < mPrior.mValues.length; ++i)
out.writeDouble(mPrior.mValues[i]);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
int numDimensions = in.readInt();
double[] priorVariances = new double[numDimensions];
for (int i = 0; i < numDimensions; ++i)
priorVariances[i] = in.readDouble();
return new LaplaceRegressionPrior(priorVariances);
}
}
}
static class CauchyRegressionPrior
extends ArrayRegressionPrior
implements Serializable {
static final long serialVersionUID = 2351846943518745614L;
CauchyRegressionPrior(double[] priorSquaredScales) {
super(priorSquaredScales);
}
@Override
public double gradient(double beta, int dimension) {
return 2.0 * beta / (beta * beta + mValues[dimension]);
}
@Override
public double log2Prior(double beta, int dimension) {
return log21OverPi
+ 0.5 * com.aliasi.util.Math.log2(mValues[dimension])
- com.aliasi.util.Math.log2(beta * beta + mValues[dimension]
* mValues[dimension]);
}
@Override
public String toString() {
return toString("CauchyRegressionPrior","Scale");
}
private Object writeReplace() {
return new Serializer(this);
}
private static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = 5202676106810759907L;
final CauchyRegressionPrior mPrior;
public Serializer(CauchyRegressionPrior prior) {
mPrior = prior;
}
public Serializer() {
this(null);
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeInt(mPrior.mValues.length);
for (int i = 0; i < mPrior.mValues.length; ++i)
out.writeDouble(mPrior.mValues[i]);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
int numDimensions = in.readInt();
double[] priorScales = new double[numDimensions];
for (int i = 0; i < numDimensions; ++i)
priorScales[i] = in.readDouble();
return new CauchyRegressionPrior(priorScales);
}
}
}
static abstract class VariableRegressionPrior extends RegressionPrior {
static final long serialVersionUID = -7527207309328127863L;
final double mPriorVariance;
final boolean mNoninformativeIntercept;
VariableRegressionPrior(double priorVariance,
boolean noninformativeIntercept) {
mPriorVariance = priorVariance;
mNoninformativeIntercept = noninformativeIntercept;
}
public String toString(String priorName, String paramName) {
return priorName + "(" + paramName + "=" + mPriorVariance
+ ", noninformativeIntercept=" + mNoninformativeIntercept + ")";
}
}
static class VariableGaussianRegressionPrior
extends VariableRegressionPrior
implements Serializable {
static final long serialVersionUID = -7527207309328127863L;
VariableGaussianRegressionPrior(double priorVariance,
boolean noninformativeIntercept) {
super(priorVariance,noninformativeIntercept);
}
@Override
public double gradient(double beta, int dimension) {
return (dimension == 0 && mNoninformativeIntercept)
? 0.0
: beta / mPriorVariance;
}
@Override
public double log2Prior(double beta, int dimension) {
if (dimension == 0 && mNoninformativeIntercept)
return 0.0; // log(1)=0.0
return -log2Sqrt2Pi
- 0.5 * com.aliasi.util.Math.log2(mPriorVariance)
- beta * beta / (2.0 * mPriorVariance);
}
@Override
public String toString() {
return toString("GaussianRegressionPrior","Variance");
}
private Object writeReplace() {
return new Serializer(this);
}
private static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = 5979483825025936160L;
final VariableGaussianRegressionPrior mPrior;
public Serializer(VariableGaussianRegressionPrior prior) {
mPrior = prior;
}
public Serializer() {
this(null);
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeDouble(mPrior.mPriorVariance);
out.writeBoolean(mPrior.mNoninformativeIntercept);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
double priorVariance = in.readDouble();
boolean noninformativeIntercept = in.readBoolean();
return new VariableGaussianRegressionPrior(priorVariance,
noninformativeIntercept);
}
}
}
static class VariableLaplaceRegressionPrior
extends VariableRegressionPrior
implements Serializable {
static final long serialVersionUID = -4286001162222250623L;
final double mPositiveGradient;
final double mNegativeGradient;
final double mPriorIntercept;
final double mPriorCoefficient;
VariableLaplaceRegressionPrior(double priorVariance,
boolean noninformativeIntercept) {
super(priorVariance,noninformativeIntercept);
mPositiveGradient = Math.sqrt(2.0/priorVariance);
mNegativeGradient = -mPositiveGradient;
mPriorIntercept
= log2Sqrt2Over2 - 0.5
* com.aliasi.util.Math.log2(priorVariance);
mPriorCoefficient = -sqrt2 / Math.sqrt(priorVariance);
}
@Override
public double gradient(double beta, int dimension) {
return (dimension == 0 && mNoninformativeIntercept || beta == 0.0)
? 0.0
: (beta > 0
? mPositiveGradient
: mNegativeGradient );
}
@Override
public double log2Prior(double beta, int dimension) {
if (dimension == 0 && mNoninformativeIntercept)
return 0.0;
return mPriorIntercept + mPriorCoefficient * Math.abs(beta);
// return log2Sqrt2Over2
// - 0.5 * com.aliasi.util.Math.log2(mPriorVariance)
// - sqrt2 * Math.abs(beta) / Math.sqrt(mPriorVariance);
}
@Override
public String toString() {
return toString("LaplaceRegressionPrior","Variance");
}
private Object writeReplace() {
return new Serializer(this);
}
private static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = 2321796089407881776L;
final VariableLaplaceRegressionPrior mPrior;
public Serializer(VariableLaplaceRegressionPrior prior) {
mPrior = prior;
}
public Serializer() {
this(null);
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeDouble(mPrior.mPriorVariance);
out.writeBoolean(mPrior.mNoninformativeIntercept);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
double priorVariance = in.readDouble();
boolean noninformativeIntercept = in.readBoolean();
return new VariableLaplaceRegressionPrior(priorVariance,
noninformativeIntercept);
}
}
}
static class VariableCauchyRegressionPrior
extends VariableRegressionPrior {
static final long serialVersionUID = 3368658136325392652L;
VariableCauchyRegressionPrior(double priorVariance,
boolean noninformativeIntercept) {
super(priorVariance,noninformativeIntercept);
}
@Override
public double gradient(double beta, int dimension) {
return (dimension == 0 && mNoninformativeIntercept)
? 0
: 2.0 * beta / (beta * beta + mPriorVariance);
}
@Override
public double log2Prior(double beta, int dimension) {
if (dimension == 0 && mNoninformativeIntercept)
return 0.0;
return log21OverPi
+ 0.5 * com.aliasi.util.Math.log2(mPriorVariance)
- com.aliasi.util.Math.log2(beta * beta + mPriorVariance);
}
@Override
public String toString() {
return toString("CauchyRegressionPrior","Scale");
}
public Object writeReplace() {
return new Serializer(this);
}
private static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = -7209096281888148303L;
final VariableCauchyRegressionPrior mPrior;
public Serializer(VariableCauchyRegressionPrior prior) {
mPrior = prior;
}
public Serializer() {
this(null);
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeDouble(mPrior.mPriorVariance);
out.writeBoolean(mPrior.mNoninformativeIntercept);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
double priorScale = in.readDouble();
boolean noninformativeIntercept = in.readBoolean();
return new VariableCauchyRegressionPrior(priorScale,
noninformativeIntercept);
}
}
}
static class LogInterpolatedRegressionPrior extends RegressionPrior {
static final long serialVersionUID = 1052451778773339516L;
private final double mAlpha;
private final RegressionPrior mPrior1;
private final RegressionPrior mPrior2;
LogInterpolatedRegressionPrior(double alpha,
RegressionPrior prior1,
RegressionPrior prior2) {
mAlpha = alpha;
mPrior1 = prior1;
mPrior2 = prior2;
}
@Override
public double gradient(double beta, int dimension) {
return mAlpha * mPrior1.gradient(beta,dimension)
+ (1 - mAlpha) * mPrior2.gradient(beta,dimension);
}
@Override
public double log2Prior(double beta, int dimension) {
return mAlpha * mPrior1.log2Prior(beta,dimension)
+ (1 - mAlpha) * mPrior2.log2Prior(beta,dimension);
}
@Override
public String toString() {
return "LogInterpolatedRegressionPrior("
+ "alpha=" + mAlpha
+ ", prior1=" + mPrior1
+ ", prior2=" + mPrior2 + ")";
}
Object writeReplace() {
return new Serializer(this);
}
static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = 1071183663202516816L;
final LogInterpolatedRegressionPrior mPrior;
public Serializer() {
this(null);
}
public Serializer(LogInterpolatedRegressionPrior prior) {
mPrior = prior;
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeDouble(mPrior.mAlpha);
out.writeObject(mPrior.mPrior1);
out.writeObject(mPrior.mPrior2);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
double alpha = in.readDouble();
@SuppressWarnings("unchecked")
RegressionPrior prior1 = (RegressionPrior) in.readObject();
@SuppressWarnings("unchecked")
RegressionPrior prior2 = (RegressionPrior) in.readObject();
return new LogInterpolatedRegressionPrior(alpha,prior1,prior2);
}
}
}
static class ShiftMeans extends RegressionPrior {
static final long serialVersionUID = 5159543505446681732L;
private final double[] mMeans;
private final RegressionPrior mPrior;
ShiftMeans(double[] means,
RegressionPrior prior) {
mPrior = prior;
mMeans = means;
}
@Override
public double mode(int i) {
return mMeans[i] + mPrior.mode(i);
}
@Override
public boolean isUniform() {
return mPrior.isUniform();
}
@Override
public double log2Prior(double betaI, int i) {
return mPrior.log2Prior(betaI - mMeans[i],i);
}
@Override
public double gradient(double betaI, int i) {
return mPrior.gradient(betaI - mMeans[i],i);
}
@Override
public String toString() {
return "ShiftMeans(means=...,prior=" + mPrior + ")";
}
static class Serializer extends AbstractExternalizable {
static final long serialVersionUID = -777157399350907424L;
final ShiftMeans mPrior;
public Serializer() {
this(null);
}
public Serializer(ShiftMeans prior) {
mPrior = prior;
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
writeDoubles(mPrior.mMeans,out);
out.writeObject(mPrior.mPrior);
}
@Override
public Object read(ObjectInput in)
throws IOException, ClassNotFoundException {
double[] means = readDoubles(in);
@SuppressWarnings("unchecked")
RegressionPrior prior = (RegressionPrior) in.readObject();
return new ShiftMeans(means,prior);
}
}
}
}