org.deeplearning4j.nn.conf.GradientNormalization Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of deeplearning4j-nn Show documentation
There is a newer version: 1.0.0-M2.1
package org.deeplearning4j.nn.conf;

/**Gradient normalization strategies. These are applied on raw gradients, before the gradients are passed to the
 * updater (SGD, RMSProp, Momentum, etc)

 * None = no gradient normalization (default)

 *
 * RenormalizeL2PerLayer = rescale gradients by dividing by the L2 norm of all gradients for the layer.

 *
 * RenormalizeL2PerParamType = rescale gradients by dividing by the L2 norm of the gradients, separately for
 * each type of parameter within the layer.

 * This differs from RenormalizeL2PerLayer in that here, each parameter type (weight, bias etc) is normalized separately.

 * For example, in a MLP/FeedForward network (where G is the gradient vector), the output is as follows:
 * 

 *     GOut_weight = G_weight / l2(G_weight)
 *     GOut_bias = G_bias / l2(G_bias)
 * 
 * 
 *
 * ClipElementWiseAbsoluteValue = clip the gradients on a per-element basis.

 * For each gradient g, set g <- sign(g)*max(maxAllowedValue,|g|).

 * i.e., if a parameter gradient has absolute value greater than the threshold, truncate it.

 * For example, if threshold = 5, then values in range -5<g<5 are unmodified; values <-5 are set
 * to -5; values >5 are set to 5.

 * This was proposed by Mikolov (2012), Statistical Language Models Based on Neural Networks (thesis),
 * http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf
 * in the context of learning recurrent neural networks.

 * Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
 * 
 *
 * ClipL2PerLayer = conditional renormalization. Somewhat similar to RenormalizeL2PerLayer, this strategy
 * scales the gradients if and only if the L2 norm of the gradients (for entire layer) exceeds a specified
 * threshold. Specifically, if G is gradient vector for the layer, then:
 * 

 *     GOut = G     if l2Norm(G) < threshold (i.e., no change) 
 *     GOut = threshold * G / l2Norm(G)     otherwise 
 * 
 * Thus, the l2 norm of the scaled gradients will not exceed the specified threshold, though may be smaller than it

 * See: Pascanu, Mikolov, Bengio (2012), On the difficulty of training Recurrent Neural Networks,
 * http://arxiv.org/abs/1211.5063

 * Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
 * 
 *
 * ClipL2PerParamType = conditional renormalization. Very similar to ClipL2PerLayer, however instead of clipping
 * per layer, do clipping on each parameter type separately.

 * For example in a recurrent neural network, input weight gradients, recurrent weight gradients and bias gradient are all
 * clipped separately. Thus if one set of gradients are very large, these may be clipped while leaving the other gradients
 * unmodified.

 * Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
 *
 * @author Alex Black
 */
public enum GradientNormalization {
    None, RenormalizeL2PerLayer, RenormalizeL2PerParamType, ClipElementWiseAbsoluteValue, ClipL2PerLayer, ClipL2PerParamType
}