All Downloads are FREE. Search and download functionalities are using the official Maven repository.

deepboof.forward.BatchNorm Maven / Gradle / Ivy

There is a newer version: 0.5.2
Show newest version
/*
 * Copyright (c) 2016, Peter Abeles. All Rights Reserved.
 *
 * This file is part of DeepBoof
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package deepboof.forward;

/**
 * 

Batch Normalization [1] determines the mean and standard deviation (stdev) of each input element individually * using the training data. It then applies a transform (minus mean, divide stdev) for each individual * element to ensure it has zero mean and a standard deviation of 1 across the training set. This alleviates * many problems with choosing appropriate initial parameters for inputs across all layers.

* *

During training, batch norm computes a mean and variance for each the input element. Mean/stdev are computed by * finding the mean and stdev for a mini-batch and then applying a decaying average. For * evaluation the previously computed mean and stdev are fixed and applied to each input element in an * element-wise fashion, see below. The final mean and stdev can be computed from the decaying mean/stdev or * from the true mean/stdev across the entire dataset, implementation dependent.

* *

It can optionally also learn two parameters, gamma and beta, which can be used to learn to undo batch * normalization if helpful. The complete transformation is shown below

* *
 * output[i] = ((x[i]-mean[i])/sqrt(variance[i]+EPS)*gamma[i] + beta[i]
 * 
* Where 'i' is an element in the tensor. EPS is a small number used to prevent divide by zero errors * and is a tuning hyper parameter. EPS is 1e-9 for double and 1e-5 for float by default. * *

Training Update:

*
 * mean[i+1]  = learn_rate*mean  + (1.0-learn_rate)*mean[i]
 * stdev[i+1] = learn_rate*stdev + (1.0-learn_rate)*stdev[i] TODO change to variance?
 * 
* where (mean,stdev) with no index refers to the statistics from the current mini-batch its being trained on. * learn_rate determines how quickly it adjusts the mean and can have a value from 0 to 1, higher values for faster * but less stable learning, e.g. 0 = no learning and 1 = old results discarded.

* * Notes: *
    *
  • The shape of the output will be same as the shape of the input.
  • *
  • There are other variants for specific situations, e.g. {@link SpatialBatchNorm}
  • *
* *

[1] Sergey Ioffe, Christian Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing * Internal Covariate Shift" 11 Feb 2015, http://arxiv.org/abs/1502.03167

* * @author Peter Abeles */ public interface BatchNorm { /** * If it returns true then it expects a second set of parameters that defines gamma and beta. * @return true if gamma and beta is returned. */ boolean hasGammaBeta(); double getEPS(); /** * Used to specify the EPS value. Must be invoked before setParameters() * * @param EPS Value of EPS */ void setEPS(double EPS); }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy