org.deeplearning4j.nn.conf.ConvolutionMode Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of deeplearning4j-nn Show documentation
There is a newer version: 1.0.0-M2.1
package org.deeplearning4j.nn.conf;

/**
 * ConvolutionMode defines how convolution operations should be executed for Convolutional and Subsampling layers,
 * for a given input size and network configuration (specifically stride/padding/kernel sizes).

 * Currently, 3 modes are provided:
 * 

 * 

 * Strict: Output size for Convolutional and Subsampling layers are calculated as follows, in each dimension:
 * outputSize = (inputSize - kernelSize + 2*padding) / stride + 1. If outputSize is not an integer, an exception will
 * be thrown during network initialization or forward pass.
 * 

 * 

 * 

 * Truncate: Output size for Convolutional and Subsampling layers are calculated in the same way as in Strict (that
 * is, outputSize = (inputSize - kernelSize + 2*padding) / stride + 1) in each dimension.

 * If outputSize is an integer, then Strict and Truncate are identical. However, if outputSize is not an integer,
 * the output size will be rounded down to an integer value.

 * Specifically, ConvolutionMode.Truncate implements the following:

 * output height = floor((inputHeight - kernelHeight + 2*paddingHeight) / strideHeight) + 1.

 * output width = floor((inputWidth - kernelWidth + 2*paddingWidth) / strideWidth) + 1.

 * where 'floor' is the floor operation (i.e., round down to the nearest integer).

 * 

 * The major consequence of this rounding down: a border/edge effect will be seen if/when rounding down is required.
 * In effect, some number of inputs along the given dimension (height or width) will not be used as input and hence
 * some input activations can be lost/ignored. This can be problematic higher in the network (where the cropped activations
 * may represent a significant proportion of the original input), or with large kernel sizes and strides.

 * In the given dimension (height or width) the number of truncated/cropped input values is equal to
 * (inputSize - kernelSize + 2*padding) % stride. (where % is the modulus/remainder operation).

 * 

 * 

 * 

 * Same: Same mode operates differently to Strict/Truncate, in three key ways:

 * (a) Manual padding values in convolution/subsampling layer configuration is not used; padding values are instead calculated
 *     automatically based on the input size, kernel size and strides.

 * (b) The output sizes are calculated differently (see below) compared to Strict/Truncate. Most notably, when stride = 1
 *     the output size is the same as the input size.

 * (c) The calculated padding values may different for top/bottom, and left/right (when they do differ: right and bottom
 *     may have 1 pixel/row/column more than top/left padding)

 * The output size of a Convolutional/Subsampling layer using ConvolutionMode.Same is calculated as follows:

 * output height = ceil( inputHeight / strideHeight )

 * output width = ceil( inputWidth / strideWidth )

 * where 'ceil' is the ceiling operation (i.e., round up to the nearest integer).

 * 

 * The padding for top/bottom and left/right are automatically calculated as follows:

 * totalHeightPadding = (outputHeight - 1) * strideHeight + filterHeight - inputHeight

 * totalWidthPadding =  (outputWidth - 1) * strideWidth + filterWidth - inputWidth

 * topPadding = totalHeightPadding / 2      (note: integer division)

 * bottomPadding = totalHeightPadding - topPadding

 * leftPadding = totalWidthPadding / 2      (note: integer division)

 * rightPadding = totalWidthPadding - leftPadding

 * Note that if top/bottom padding differ, then bottomPadding = topPadding + 1
 * 

 * 

 * 

 * For further information on output sizes for convolutional neural networks, see the "Spatial arrangement" section at
 * http://cs231n.github.io/convolutional-networks/
 *
 * @author Alex Black
 */
public enum ConvolutionMode {

    Strict, Truncate, Same

}