All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.deeplearning4j.nn.conf.GradientNormalization Maven / Gradle / Ivy

There is a newer version: 1.0.0-M2.1
Show newest version
package org.deeplearning4j.nn.conf;

/**Gradient normalization strategies. These are applied on raw gradients, before the gradients are passed to the
 * updater (SGD, RMSProp, Momentum, etc)
*

None = no gradient normalization (default)

* *

RenormalizeL2PerLayer = rescale gradients by dividing by the L2 norm of all gradients for the layer.

* *

RenormalizeL2PerParamType = rescale gradients by dividing by the L2 norm of the gradients, separately for * each type of parameter within the layer.
* This differs from RenormalizeL2PerLayer in that here, each parameter type (weight, bias etc) is normalized separately.
* For example, in a MLP/FeedForward network (where G is the gradient vector), the output is as follows: *

    *
  • GOut_weight = G_weight / l2(G_weight)
  • *
  • GOut_bias = G_bias / l2(G_bias)
  • *
*

* *

ClipElementWiseAbsoluteValue = clip the gradients on a per-element basis.
* For each gradient g, set g <- sign(g)*max(maxAllowedValue,|g|).
* i.e., if a parameter gradient has absolute value greater than the threshold, truncate it.
* For example, if threshold = 5, then values in range -5<g<5 are unmodified; values <-5 are set * to -5; values >5 are set to 5.
* This was proposed by Mikolov (2012), Statistical Language Models Based on Neural Networks (thesis), * http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf * in the context of learning recurrent neural networks.
* Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold) *

* *

ClipL2PerLayer = conditional renormalization. Somewhat similar to RenormalizeL2PerLayer, this strategy * scales the gradients if and only if the L2 norm of the gradients (for entire layer) exceeds a specified * threshold. Specifically, if G is gradient vector for the layer, then: *

    *
  • GOut = G     if l2Norm(G) < threshold (i.e., no change)
  • *
  • GOut = threshold * G / l2Norm(G)     otherwise
  • *
* Thus, the l2 norm of the scaled gradients will not exceed the specified threshold, though may be smaller than it
* See: Pascanu, Mikolov, Bengio (2012), On the difficulty of training Recurrent Neural Networks, * http://arxiv.org/abs/1211.5063
* Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold) *

* *

ClipL2PerParamType = conditional renormalization. Very similar to ClipL2PerLayer, however instead of clipping * per layer, do clipping on each parameter type separately.
* For example in a recurrent neural network, input weight gradients, recurrent weight gradients and bias gradient are all * clipped separately. Thus if one set of gradients are very large, these may be clipped while leaving the other gradients * unmodified.
* Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)

* * @author Alex Black */ public enum GradientNormalization { None, RenormalizeL2PerLayer, RenormalizeL2PerParamType, ClipElementWiseAbsoluteValue, ClipL2PerLayer, ClipL2PerParamType }




© 2015 - 2024 Weber Informatics LLC | Privacy Policy