scripts.nn.layers.conv2d_depthwise.dml Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of systemml Show documentation
Show all versions of systemml Show documentation
Declarative Machine Learning
#-------------------------------------------------------------
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#-------------------------------------------------------------
/*
* 2D Depthwise Convolutional layer.
*
* Utilizes built-in convolution operators for higher performance.
*/
source("nn/util.dml") as util
forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
int Hin, int Win, int M, int Hf, int Wf,
int strideh, int stridew, int padh, int padw)
return (matrix[double] out, int Hout, int Wout) {
/*
* Computes the forward pass for a 2D depthwise spatial convolutional
* layer with C*M filters of depth 1. The input data has N examples,
* each represented as a 3D volume with C channels unrolled into a
* single vector. For each group of M filters, a 2D convolution is
* applied to 1 unique input channel, yielding M output channels per
* input channel. The resulting C groups of M output channels are
* then concatenated together channel-wise into a single volume of C*M
* output channels. This can also be interpreted as C filters of
* depth 1 that expand each input channel to M output channels, where
* M is a "depth multiplier".
*
* Although there are C*M filters of depth 1, instead of storing W as
* shape `(C*M, 1*Hf*Wf)`, we reshape it to `(C, M*Hf*Wf)` for
* performance reasons.
*
* Inputs:
* - X: Inputs, of shape (N, C*Hin*Win).
* - W: Weights, of shape (C, M*Hf*Wf).
* - b: Biases, of shape (C*M, 1).
* - Hin: Input height.
* - Win: Input width.
* - M: Number of filters per input channel (i.e. depth multiplier).
* - Hf: Filter height.
* - Wf: Filter width.
* - strideh: Stride over height.
* - stridew: Stride over width.
* - padh: Padding for top and bottom sides.
* For same output height as input, set `padh = (Hf - 1) / 2`,
* assuming `strideh = 1`.
* More generally, `padh = (Hin*(strideh-1) + Hf - strideh) / 2`
* preserves the spatial dimensions of the input.
* - padw: Padding for left and right sides.
* For same output width as input, set `padw = (Wf - 1) / 2`,
* assuming `stridew = 1`.
* More generally, `padw = (Win*(stridew-1) + Wf - stridew) / 2`
* preserves the spatial dimensions of the input.
*
* Outputs:
* - out: Outputs, of shape (N, C*M*Hout*Wout).
* - Hout: Output height.
* - Wout: Output width.
*/
N = nrow(X)
C = nrow(W)
Hout = as.integer(floor((Hin + 2*padh - Hf)/strideh + 1))
Wout = as.integer(floor((Win + 2*padw - Wf)/stridew + 1))
# create output volume
# NOTE: We initialize to 1s vs. 0s to avoid conversions between sparse and dense formats.
# This is a complete hack until the engine is improved.
out = matrix(1, rows=N, cols=C*M*Hout*Wout)
# depthwise convolution
# TODO: Explore usage of parfor loops more to determine if they can provide a performance
# benefit. Initial tests show that they are slower than the regular for loop, likely because
# they cause a reduction from a multithreaded conv2d op to a singlethreaded version. For a
# number of channels >> the number of examples, it's possible that the parfor loop could be
# faster.
#parfor (c in 1:C, check=0) { # each channel
for (c in 1:C) { # each channel
# run conv2d on each input channel separately, each with a different filter
Xc = X[,((c-1)*Hin*Win + 1):c*Hin*Win] # shape (N, 1*Hin*Win)
Wc = matrix(W[c,], rows=M, cols=Hf*Wf) # shape (M, Hf*Wf)
outc = conv2d(Xc, Wc, input_shape=[N,1,Hin,Win], filter_shape=[M,1,Hf,Wf],
stride=[strideh,stridew], padding=[padh,padw]) # shape (N, M*Hout*Wout)
out[,((c-1)*M*Hout*Wout + 1):c*M*Hout*Wout] = outc
}
# add bias term to each output filter
out = bias_add(out, b)
}
backward = function(matrix[double] dout, int Hout, int Wout,
matrix[double] X, matrix[double] W, matrix[double] b,
int Hin, int Win, int M, int Hf, int Wf,
int strideh, int stridew, int padh, int padw)
return (matrix[double] dX, matrix[double] dW, matrix[double] db) {
/*
* Computes the backward pass for a 2D depthwise spatial convolutional
* layer with C*M filters of depth 1.
*
* Inputs:
* - dout: Gradient wrt `out` from upstream, of
* shape (N, C*M*Hout*Wout).
* - Hout: Output height.
* - Wout: Output width.
* - X: Inputs, of shape (N, C*Hin*Win).
* - W: Weights, of shape (C, M*Hf*Wf).
* - b: Biases, of shape (C*M, 1).
* - Hin: Input height.
* - Win: Input width.
* - M: Num filters per input channel (i.e. depth multiplier).
* - Hf: Filter height.
* - Wf: Filter width.
* - strideh: Stride over height.
* - stridew: Stride over width.
* - padh: Padding for top and bottom sides.
* - padw: Padding for left and right sides.
*
* Outputs:
* - dX: Gradient wrt `X`, of shape (N, C*Hin*Win).
* - dW: Gradient wrt `W`, of shape (C, M*Hf*Wf).
* - db: Gradient wrt `b`, of shape (C*M, 1).
*/
N = nrow(X)
C = nrow(W)
# create gradient volumes
# NOTE: We initialize to 1s vs. 0s to avoid conversions between sparse and dense formats.
# This is a complete hack until the engine is improved.
dX = matrix(1, rows=N, cols=C*Hin*Win)
dW = matrix(1, rows=C, cols=M*Hf*Wf)
db = matrix(1, rows=C*M, cols=1)
# partial derivatives for depthwise convolution
for (c in 1:C) { # all examples
# extract channel c
doutc = dout[,((c-1)*M*Hout*Wout + 1):c*M*Hout*Wout] # (N,M*Hout*Wout)
Xc = X[,((c-1)*Hin*Win + 1):c*Hin*Win] # shape (N, 1*Hin*Win)
Wc = matrix(W[c,], rows=M, cols=Hf*Wf) # shape (M, 1*Hf*Wf)
# compute gradients for channel c
dWc = conv2d_backward_filter(Xc, doutc, stride=[strideh,stridew], padding=[padh,padw],
input_shape=[N,1,Hin,Win], filter_shape=[M,1,Hf,Wf])
dXc = conv2d_backward_data(Wc, doutc, stride=[strideh,stridew], padding=[padh,padw],
input_shape=[N,1,Hin,Win], filter_shape=[M,1,Hf,Wf])
# store
dX[,((c-1)*Hin*Win + 1):c*Hin*Win] = dXc
dW[c,] = matrix(dWc, rows=1, cols=M*Hf*Wf)
}
# partial derivatives for bias vector
db = util::channel_sums(dout, C*M, Hout, Wout)
}
init = function(int C, int M, int Hf, int Wf)
return (matrix[double] W, matrix[double] b) {
/*
* Initialize the parameters of this layer.
*
* Note: This is just a convenience function, and parameters
* may be initialized manually if needed.
*
* We use the heuristic by He et al., which limits the magnification
* of inputs/gradients during forward/backward passes by scaling
* unit-Gaussian weights by a factor of sqrt(2/n), under the
* assumption of relu neurons.
* - http://arxiv.org/abs/1502.01852
*
* Inputs:
* - C: Number of input channels (dimensionality of depth).
* - M: Number of filters per input channel (i.e. depth multiplier).
* - Hf: Filter height.
* - Wf: Filter width.
*
* Outputs:
* - W: Weights, of shape (C, M*Hf*Wf).
* - b: Biases, of shape (C*M, 1).
*/
# Note: Each filter is applied to a volume of depth 1, so we only use Hf*Wf in the scaling factor.
W = rand(rows=C, cols=M*Hf*Wf, pdf="normal") * sqrt(2.0/(Hf*Wf))
b = matrix(0, rows=C*M, cols=1)
}