burlap.behavior.singleagent.shaping.potential.PotentialShapedRF Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of burlap Show documentation
Show all versions of burlap Show documentation
The Brown-UMBC Reinforcement Learning and Planning (BURLAP) Java code library is for the use and
development of single or multi-agent planning and learning algorithms and domains to accompany them. The library
uses a highly flexible state/observation representation where you define states with your own Java classes, enabling
support for domains that discrete, continuous, relational, or anything else. Planning and learning algorithms range from classic forward search
planning to value-function-based stochastic planning and learning algorithms.
The newest version!
package burlap.behavior.singleagent.shaping.potential;
import burlap.behavior.singleagent.shaping.ShapedRewardFunction;
import burlap.mdp.core.action.Action;
import burlap.mdp.core.state.State;
import burlap.mdp.singleagent.model.RewardFunction;
/**
* This class is used to implement Potential-based reward shaping [1] which is guaranteed to preserve the optimal policy. This class
* requires a {@link PotentialFunction} and the discount being used by the MDP. The additive reward is defined as:
* d * p(s') - p(s)
* where d is this discount factor, s' is the most recent state, s is the previous state, and p(s) is the potential of state s.
*
*
* 1. Ng, Andrew Y., Daishi Harada, and Stuart Russell. "Policy invariance under reward transformations: Theory and application to reward shaping." ICML. 1999.
*
* @author James MacGlashan
*
*/
public class PotentialShapedRF extends ShapedRewardFunction {
/**
* The potential function that can be used to return the potential reward from input states.
*/
protected PotentialFunction potentialFunction;
/**
* The discount factor the MDP (required for this to shaping to preserve policy optimality)
*/
protected double discount;
/**
* Initializes the shaping with the objective reward function, the potential function, and the discount of the MDP.
* @param baseRF the objective task reward function.
* @param potentialFunction the potential function to use.
* @param discount the discount factor of the MDP.
*/
public PotentialShapedRF(RewardFunction baseRF, PotentialFunction potentialFunction, double discount) {
super(baseRF);
this.potentialFunction = potentialFunction;
this.discount = discount;
}
@Override
public double additiveReward(State s, Action a, State sprime) {
return (this.discount * this.potentialFunction.potentialValue(sprime)) - this.potentialFunction.potentialValue(s);
}
}
© 2015 - 2024 Weber Informatics LLC | Privacy Policy