All Downloads are FREE. Search and download functionalities are using the official Maven repository.

weka.core.Optimization Maven / Gradle / Ivy

Go to download

The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This is the stable version. Apart from bugfixes, this version does not receive any other updates.

There is a newer version: 3.8.6
Show newest version
/*
 *    This program is free software; you can redistribute it and/or modify
 *    it under the terms of the GNU General Public License as published by
 *    the Free Software Foundation; either version 2 of the License, or
 *    (at your option) any later version.
 *
 *    This program is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *    GNU General Public License for more details.
 *
 *    You should have received a copy of the GNU General Public License
 *    along with this program; if not, write to the Free Software
 *    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */

/*
 *    Optimization.java
 *    Copyright (C) 2003 University of Waikato, Hamilton, New Zealand
 *
 */

package weka.core;

import weka.core.TechnicalInformation.Field;
import weka.core.TechnicalInformation.Type;

/**
 * Implementation of Active-sets method with BFGS update to solve optimization
 * problem with only bounds constraints in multi-dimensions.  In this
 * implementation we consider both the lower and higher bound constraints. 

* * Here is the sketch of our searching strategy, and the detailed description * of the algorithm can be found in the Appendix of Xin Xu's MSc thesis:

* Initialize everything, incl. initial value, direction, etc.

* LOOP (main algorithm):
* * 1. Perform the line search using the directions for free variables
* 1.1 Check all the bounds that are not "active" (i.e. binding variables) * and compute the feasible step length to the bound for each of them
* 1.2 Pick up the least feasible step length, say \alpha, and set it as * the upper bound of the current step length, i.e. * 0<\lambda<=\alpha
* 1.3 Search for any possible step length<=\alpha that can result the * "sufficient function decrease" (\alpha condition) AND "positive * definite inverse Hessian" (\beta condition), if possible, using * SAFEGUARDED polynomial interpolation. This step length is "safe" and * thus is used to compute the next value of the free variables .
* 1.4 Fix the variable(s) that are newly bound to its constraint(s).

* * 2. Check whether there is convergence of all variables or their gradients. * If there is, check the possibilities to release any current bindings of * the fixed variables to their bounds based on the "reliable" second-order * Lagarange multipliers if available. If it's available and negative for * one variable, then release it. If not available, use first-order * Lagarange multiplier to test release. If there is any released * variables, STOP the loop. Otherwise update the inverse of Hessian matrix * and gradient for the newly released variables and CONTINUE LOOP.

* * 3. Use BFGS formula to update the inverse of Hessian matrix. Note the * already-fixed variables must have zeros in the corresponding entries * in the inverse Hessian.

* * 4. Compute the new (newton) search direction d=H^{-1}*g, where H^{-1} is the * inverse Hessian and g is the Jacobian. Note that again, the already- * fixed variables will have zero direction.

* * ENDLOOP

* * A typical usage of this class is to create your own subclass of this class * and provide the objective function and gradients as follows:

*

 * class MyOpt extends Optimization {
 *   // Provide the objective function
 *   protected double objectiveFunction(double[] x) {
 *       // How to calculate your objective function...
 *       // ...
 *   }
 *
 *   // Provide the first derivatives
 *   protected double[] evaluateGradient(double[] x) {
 *       // How to calculate the gradient of the objective function...
 *       // ...
 *   }
 *
 *   // If possible, provide the index^{th} row of the Hessian matrix
 *   protected double[] evaluateHessian(double[] x, int index) {
 *      // How to calculate the index^th variable's second derivative
 *      // ... 
 *   }
 * }
 * 
* * When it's the time to use it, in some routine(s) of other class... *
 * MyOpt opt = new MyOpt();
 * 
 * // Set up initial variable values and bound constraints
 * double[] x = new double[numVariables];
 * // Lower and upper bounds: 1st row is lower bounds, 2nd is upper
 * double[] constraints = new double[2][numVariables];
 * ...
 *
 * // Find the minimum, 200 iterations as default
 * x = opt.findArgmin(x, constraints); 
 * while(x == null){  // 200 iterations are not enough
 *    x = opt.getVarbValues();  // Try another 200 iterations
 *    x = opt.findArgmin(x, constraints);
 * }
 *
 * // The minimal function value
 * double minFunction = opt.getMinFunction();
 * ...
 * 
* * It is recommended that Hessian values be provided so that the second-order * Lagrangian multiplier estimate can be calcluated. However, if it is not * provided, there is no need to override the evaluateHessian() * function.

* * REFERENCES (see also the getTechnicalInformation() method):
* The whole model algorithm is adapted from Chapter 5 and other related * chapters in Gill, Murray and Wright(1981) "Practical Optimization", Academic * Press. and Gill and Murray(1976) "Minimization Subject to Bounds on the * Variables", NPL Report NAC72, while Chong and Zak(1996) "An Introduction to * Optimization", John Wiley & Sons, Inc. provides us a brief but helpful * introduction to the method.

* * Dennis and Schnabel(1983) "Numerical Methods for Unconstrained Optimization * and Nonlinear Equations", Prentice-Hall Inc. and Press et al.(1992) "Numeric * Recipe in C", Second Edition, Cambridge University Press. are consulted for * the polynomial interpolation used in the line search implementation.

* * The Hessian modification in BFGS update uses Cholesky factorization and two * rank-one modifications:
* Bk+1 = Bk + (Gk*Gk')/(Gk'Dk) + (dGk*(dGk)'))/[alpha*(dGk)'*Dk].
* where Gk is the gradient vector, Dk is the direction vector and alpha is the * step rate.
* This method is due to Gill, Golub, Murray and Saunders(1974) ``Methods for * Modifying Matrix Factorizations'', Mathematics of Computation, Vol.28, * No.126, pp 505-535.

* * @author Xin Xu ([email protected]) * @version $Revision: 1.9 $ * @see #getTechnicalInformation() */ public abstract class Optimization implements TechnicalInformationHandler, RevisionHandler { protected double m_ALF = 1.0e-4; protected double m_BETA = 0.9; protected double m_TOLX = 1.0e-6; protected double m_STPMX = 100.0; protected int m_MAXITS = 200; protected static boolean m_Debug = false; /** function value */ protected double m_f; /** G'*p */ private double m_Slope; /** Test if zero step in lnsrch */ private boolean m_IsZeroStep = false; /** Used when iteration overflow occurs */ private double[] m_X; /** Compute machine precision */ protected static double m_Epsilon, m_Zero; static { m_Epsilon=1.0; while(1.0+m_Epsilon > 1.0){ m_Epsilon /= 2.0; } m_Epsilon *= 2.0; m_Zero = Math.sqrt(m_Epsilon); if (m_Debug) System.err.print("Machine precision is "+m_Epsilon+ " and zero set to "+m_Zero); } /** * Returns an instance of a TechnicalInformation object, containing * detailed information about the technical background of this class, * e.g., paper reference or book this class is based on. * * @return the technical information about this class */ public TechnicalInformation getTechnicalInformation() { TechnicalInformation result; TechnicalInformation additional; result = new TechnicalInformation(Type.MASTERSTHESIS); result.setValue(Field.AUTHOR, "Xin Xu"); result.setValue(Field.YEAR, "2003"); result.setValue(Field.TITLE, "Statistical learning in multiple instance problem"); result.setValue(Field.SCHOOL, "University of Waikato"); result.setValue(Field.ADDRESS, "Hamilton, NZ"); result.setValue(Field.NOTE, "0657.594"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "P. E. Gill and W. Murray and M. H. Wright"); additional.setValue(Field.YEAR, "1981"); additional.setValue(Field.TITLE, "Practical Optimization"); additional.setValue(Field.PUBLISHER, "Academic Press"); additional.setValue(Field.ADDRESS, "London and New York"); additional = result.add(Type.TECHREPORT); additional.setValue(Field.AUTHOR, "P. E. Gill and W. Murray"); additional.setValue(Field.YEAR, "1976"); additional.setValue(Field.TITLE, "Minimization subject to bounds on the variables"); additional.setValue(Field.INSTITUTION, "National Physical Laboratory"); additional.setValue(Field.NUMBER, "NAC 72"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "E. K. P. Chong and S. H. Zak"); additional.setValue(Field.YEAR, "1996"); additional.setValue(Field.TITLE, "An Introduction to Optimization"); additional.setValue(Field.PUBLISHER, "John Wiley and Sons"); additional.setValue(Field.ADDRESS, "New York"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "J. E. Dennis and R. B. Schnabel"); additional.setValue(Field.YEAR, "1983"); additional.setValue(Field.TITLE, "Numerical Methods for Unconstrained Optimization and Nonlinear Equations"); additional.setValue(Field.PUBLISHER, "Prentice-Hall"); additional = result.add(Type.BOOK); additional.setValue(Field.AUTHOR, "W. H. Press and B. P. Flannery and S. A. Teukolsky and W. T. Vetterling"); additional.setValue(Field.YEAR, "1992"); additional.setValue(Field.TITLE, "Numerical Recipes in C"); additional.setValue(Field.PUBLISHER, "Cambridge University Press"); additional.setValue(Field.EDITION, "Second"); additional = result.add(Type.ARTICLE); additional.setValue(Field.AUTHOR, "P. E. Gill and G. H. Golub and W. Murray and M. A. Saunders"); additional.setValue(Field.YEAR, "1974"); additional.setValue(Field.TITLE, "Methods for modifying matrix factorizations"); additional.setValue(Field.JOURNAL, "Mathematics of Computation"); additional.setValue(Field.VOLUME, "28"); additional.setValue(Field.NUMBER, "126"); additional.setValue(Field.PAGES, "505-535"); return result; } /** * Subclass should implement this procedure to evaluate objective * function to be minimized * * @param x the variable values * @return the objective function value * @throws Exception if something goes wrong */ protected abstract double objectiveFunction(double[] x) throws Exception; /** * Subclass should implement this procedure to evaluate gradient * of the objective function * * @param x the variable values * @return the gradient vector * @throws Exception if something goes wrong */ protected abstract double[] evaluateGradient(double[] x) throws Exception; /** * Subclass is recommended to override this procedure to evaluate second-order * gradient of the objective function. If it's not provided, it returns * null. * * @param x the variables * @param index the row index in the Hessian matrix * @return one row (the row #index) of the Hessian matrix, null as default * @throws Exception if something goes wrong */ protected double[] evaluateHessian(double[] x, int index) throws Exception{ return null; } /** * Get the minimal function value * * @return minimal function value found */ public double getMinFunction() { return m_f; } /** * Set the maximal number of iterations in searching (Default 200) * * @param it the maximal number of iterations */ public void setMaxIteration(int it) { m_MAXITS=it; } /** * Set whether in debug mode * * @param db use debug or not */ public void setDebug(boolean db) { m_Debug = db; } /** * Get the variable values. Only needed when iterations exceeds * the max threshold. * * @return the current variable values */ public double[] getVarbValues() { return m_X; } /** * Find a new point x in the direction p from a point xold at which the * value of the function has decreased sufficiently, the positive * definiteness of B matrix (approximation of the inverse of the Hessian) * is preserved and no bound constraints are violated. Details see "Numerical * Methods for Unconstrained Optimization and Nonlinear Equations". * "Numeric Recipes in C" was also consulted. * * @param xold old x value * @param gradient gradient at that point * @param direct direction vector * @param stpmax maximum step length * @param isFixed indicating whether a variable has been fixed * @param nwsBounds non-working set bounds. Means these variables are free and * subject to the bound constraints in this step * @param wsBdsIndx index of variables that has working-set bounds. Means * these variables are already fixed and no longer subject to * the constraints * @return new value along direction p from xold, null if no step was taken * @throws Exception if an error occurs */ public double[] lnsrch(double[] xold, double[] gradient, double[] direct, double stpmax, boolean[] isFixed, double[][] nwsBounds, DynamicIntArray wsBdsIndx) throws Exception { int i, k,len=xold.length, fixedOne=-1; // idx of variable to be fixed double alam, alamin; // lambda to be found, and its lower bound // For convergence and bound test double temp,test,alpha=Double.POSITIVE_INFINITY,fold=m_f,sum; // For cubic interpolation double a,alam2=0,b,disc=0,maxalam=1.0,rhs1,rhs2,tmplam; double[] x = new double[len]; // New variable values // Scale the step for (sum=0.0,i=0;i stpmax){ for (i=0;i 0 if(m_Slope > m_Zero){ if(m_Debug) for(int h=0; h test) test=temp; } } if(test>m_Zero) // Not converge alamin = m_TOLX/test; else{ if (m_Debug) System.err.println("Zero directions for all free variables -- "+ "Min. found with current fixed variables"+ " (or all variables fixed). Try to release"+ " some variables now."); return x; } // Check whether any non-working-set bounds are "binding" for(i=0;i alpi){ // Fix one variable in one iteration alpha = alpi; fixedOne = i; } } else if((direct[i]>m_Epsilon) && !Double.isNaN(nwsBounds[1][i])){//Not feasible alpi = (nwsBounds[1][i]-xold[i])/direct[i]; if(alpi <= m_Zero){ // Zero if (m_Debug) System.err.println("Fix variable "+i+ " to upper bound "+ nwsBounds[1][i]+ " from value "+ xold[i]); x[i] = nwsBounds[1][i]; isFixed[i]=true; // Fix this variable alpha = 0.0; nwsBounds[1][i]=Double.NaN; //Add cons. to working set wsBdsIndx.addElement(i); } else if(alpha > alpi){ alpha = alpi; fixedOne = i; } } } } if (m_Debug){ System.err.println("alamin: " + Utils.doubleToString(alamin,10,7)); System.err.println("alpha: " + Utils.doubleToString(alpha,10,7)); } if(alpha <= m_Zero){ // Zero m_IsZeroStep = true; if (m_Debug) System.err.println("Alpha too small, try again"); return x; } alam = alpha; // Always try full feasible newton step if(alam > 1.0) alam = 1.0; // Iteration of one newton step, if necessary, backtracking is done double initF=fold, // Initial function value hi=alam, lo=alam, newSlope=0, fhi=m_f, flo=m_f;// Variables used for beta condition double[] newGrad; // Gradient on the new variable values kloop: for (k=0;;k++) { if(m_Debug) System.err.println("\nLine search iteration: " + k); for (i=0;inwsBounds[1][i])){ x[i] = nwsBounds[1][i]; //Rounding error } } } m_f = objectiveFunction(x); // Compute fnew if(Double.isNaN(m_f)) throw new Exception("Objective function value is NaN!"); while(Double.isInfinite(m_f)){ // Avoid infinity if(m_Debug) System.err.println("Too large m_f. Shrink step by half."); alam *= 0.5; // Shrink by half if(alam <= m_Epsilon){ if(m_Debug) System.err.println("Wrong starting points, change them!"); return x; } for (i=0;i= m_BETA*m_Slope){ // Beta condition: ensure pos. defnty. if(m_Debug) System.err.println("Increasing derivatives (beta condition): "); if((fixedOne!=-1) && (alam>=alpha)){ // Has bounds and over if(direct[fixedOne] > 0){ x[fixedOne] = nwsBounds[1][fixedOne]; // Avoid rounding error nwsBounds[1][fixedOne]=Double.NaN; //Add cons. to working set } else{ x[fixedOne] = nwsBounds[0][fixedOne]; // Avoid rounding error nwsBounds[0][fixedOne]=Double.NaN; //Add cons. to working set } if(m_Debug) System.err.println("Fix variable " +fixedOne+" to bound "+ x[fixedOne]+ " from value "+ xold[fixedOne]); isFixed[fixedOne]=true; // Fix the variable wsBdsIndx.addElement(fixedOne); } return x; } else if(k==0){ // First time: increase alam // Search for the smallest value not complying with alpha condition double upper = Math.min(alpha,maxalam); if(m_Debug) System.err.println("Alpha condition holds, increase alpha... "); while(!((alam>=upper) || (m_f>fold+m_ALF*alam*m_Slope))){ lo = alam; flo = m_f; alam *= 2.0; if(alam>=upper) // Avoid rounding errors alam=upper; for (i=0;i= m_BETA*m_Slope){ if (m_Debug) System.err.println("Increasing derivatives (beta condition): \n"+ "newSlope = "+Utils.doubleToString(newSlope,10,7)); if((fixedOne!=-1) && (alam>=alpha)){ // Has bounds and over if(direct[fixedOne] > 0){ x[fixedOne] = nwsBounds[1][fixedOne]; // Avoid rounding error nwsBounds[1][fixedOne]=Double.NaN; //Add cons. to working set } else{ x[fixedOne] = nwsBounds[0][fixedOne]; // Avoid rounding error nwsBounds[0][fixedOne]=Double.NaN; //Add cons. to working set } if(m_Debug) System.err.println("Fix variable " +fixedOne+" to bound "+ x[fixedOne]+ " from value "+ xold[fixedOne]); isFixed[fixedOne]=true; // Fix the variable wsBdsIndx.addElement(fixedOne); } return x; } } hi = alam; fhi = m_f; break kloop; } else{ if(m_Debug) System.err.println("Alpha condition holds."); hi = alam2; lo = alam; flo = m_f; break kloop; } } else if (alam < alamin) { // No feasible lambda found if(initF= Double.MAX_VALUE){ numerator = Double.MAX_VALUE; if (m_Debug) System.err.print("-b+sqrt(disc) too large! Set it to MAX_VALUE."); } tmplam=numerator/(3.0*a); } if (m_Debug) System.err.print("Cubic interpolation: \n" + "a: " + Utils.doubleToString(a,10,7)+ "\n" + "b: " + Utils.doubleToString(b,10,7)+ "\n" + "disc: " + Utils.doubleToString(disc,10,7)+ "\n" + "tmplam: " + tmplam + "\n" + "alam: " + Utils.doubleToString(alam,10,7)+ "\n"); if (tmplam>0.5*alam) tmplam=0.5*alam; // lambda <= 0.5*lambda_old } } alam2=alam; fhi=m_f; alam=Math.max(tmplam,0.1*alam); // lambda >= 0.1*lambda_old if(alam>alpha){ throw new Exception("Sth. wrong in lnsrch:"+ "Lambda infeasible!(lambda="+alam+ ", alpha="+alpha+", upper="+tmplam+ "|"+(-alpha*m_Slope/(2.0*((m_f-fold)/alpha-m_Slope)))+ ", m_f="+m_f+", fold="+fold+ ", slope="+m_Slope); } } // Endfor(k=0;;k++) // Quadratic interpolation between lamda values between lo and hi. // If cannot find a value satisfying beta condition, use lo. double ldiff = hi-lo, lincr; if(m_Debug) System.err.println("Last stage of searching for beta condition (alam between " +Utils.doubleToString(lo,10,7)+" and " +Utils.doubleToString(hi,10,7)+")...\n"+ "Quadratic Interpolation(QI):\n"+ "Last newSlope = "+Utils.doubleToString(newSlope, 10, 7)); while((newSlope=alamin)){ lincr = -0.5*newSlope*ldiff*ldiff/(fhi-flo-newSlope*ldiff); if(m_Debug) System.err.println("fhi = "+fhi+"\n"+ "flo = "+flo+"\n"+ "ldiff = "+ldiff+"\n"+ "lincr (using QI) = "+lincr+"\n"); if(lincr<0.2*ldiff) lincr=0.2*ldiff; alam = lo+lincr; if(alam >= hi){ // We cannot go beyond the bounds, so the best we can try is hi alam=hi; lincr=ldiff; } for (i=0;ifold+m_ALF*alam*m_Slope){ // Alpha condition fails, shrink lambda_upper ldiff = lincr; fhi = m_f; } else{ // Alpha condition holds newGrad = evaluateGradient(x); for(newSlope=0.0,i=0; i test) test = tmp; } if(test < m_Zero){ if (m_Debug) System.err.println("\nDeltaX converge: "+test); finish = true; } // Check zero gradient grad = evaluateGradient(x); test=0.0; double denom=0.0, dxSq=0.0, dgSq=0.0, newlyBounded=0.0; for(int g=0; g test) test = tmp; } if(test < m_Zero){ if (m_Debug) System.err.println("Gradient converge: "+test); finish = true; } // dg'*dx could be < 0 using inexact lnsrch if(m_Debug) System.err.println("dg'*dx="+(denom+newlyBounded)); // dg'*dx = 0 if(Math.abs(denom+newlyBounded) < m_Zero) finish = true; int size = wsBdsIndx.size(); boolean isUpdate = true; // Whether to update BFGS formula // Converge: check whether release any current constraints if(finish){ if (m_Debug) System.err.println("Test any release possible ..."); if(toFree != null) oldToFree = (DynamicIntArray)toFree.copy(); toFree = new DynamicIntArray(wsBdsIndx.size()); for(int m=size-1; m>=0; m--){ int index=wsBdsIndx.elementAt(m); double[] hessian = evaluateHessian(x, index); double deltaL=0.0; if(hessian != null){ for(int mm=0; mm= constraints[1][index]) // Upper bound L1 = -grad[index]; else if(x[index] <= constraints[0][index])// Lower bound L1 = grad[index]; else throw new Exception("x["+index+"] not fixed on the"+ " bounds where it should have been!"); // L2 = L1 + deltaL L2 = L1 + deltaL; if (m_Debug) System.err.println("Variable "+index+ ": Lagrangian="+L1+"|"+L2); //Check validity of Lagrangian multiplier estimate boolean isConverge = (2.0*Math.abs(deltaL)) < Math.min(Math.abs(L1), Math.abs(L2)); if((L1*L2>0.0) && isConverge){ //Same sign and converge: valid if(L2 < 0.0){// Negative Lagrangian: feasible toFree.addElement(index); wsBdsIndx.removeElementAt(m); finish=false; // Not optimal, cannot finish } } // Although hardly happen, better check it // If the first-order Lagrangian multiplier estimate is wrong, // avoid zigzagging if((hessian==null) && (toFree != null) && toFree.equal(oldToFree)) finish = true; } if(finish){// Min. found if (m_Debug) System.err.println("Minimum found."); m_f = objectiveFunction(x); if(Double.isNaN(m_f)) throw new Exception("Objective function value is NaN!"); return x; } // Free some variables for(int mmm=0; mmm=0)&&isZero[j]){result[j]=0.0; j--;} // go to the last row if(j>=0){ result[j] = b[j]/t.getElement(j,j); for(; j>=0; j--){ if(!isZero[j]){ double numerator=b[j]; for(int k=j+1; k * When coeff > 0, we use C1 algorithm, and otherwise we use C2 algorithm described * in ``Methods for Modifying Matrix Factorizations'' * * @param L the unit triangle matrix L * @param D the diagonal matrix D * @param v the update vector v * @param coeff the coeffcient of update * @param isFixed which variables are not to be updated */ protected void updateCholeskyFactor(Matrix L, double[] D, double[] v, double coeff, boolean[] isFixed) throws Exception{ double t, p, b; int n = v.length; double[] vp = new double[n]; for (int i=0; i0.0){ t = coeff; for(int j=0; j





© 2015 - 2025 Weber Informatics LLC | Privacy Policy