ai.djl.nn.Block Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of api Show documentation
Deep Java Library - api
The newest version!
/*
 * Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance
 * with the License. A copy of the License is located at
 *
 * http://aws.amazon.com/apache2.0/
 *
 * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
 * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
 * and limitations under the License.
 */
package ai.djl.nn;

import ai.djl.MalformedModelException;
import ai.djl.ndarray.NDList;
import ai.djl.ndarray.NDManager;
import ai.djl.ndarray.types.DataType;
import ai.djl.ndarray.types.LayoutType;
import ai.djl.ndarray.types.Shape;
import ai.djl.nn.convolutional.Conv2d;
import ai.djl.training.ParameterStore;
import ai.djl.training.initializer.Initializer;
import ai.djl.util.PairList;

import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;
import java.util.function.Predicate;

/**
 * A {@code Block} is a composable function that forms a neural network.
 *
 * Blocks serve a purpose similar to functions that convert an input NDList to an output NDList.
 * They can represent single operations, parts of a neural network, and even the whole neural
 * network. What makes blocks special is that they contain a number of parameters that are used in
 * their function and are trained during deep learning. As these parameters are trained, the
 * functions represented by the blocks get more and more accurate. Each block consists of the
 * following components:
 *
 * 

 *   Forward function
 *   
Parameters
 *   
Child blocks
 * 
 *
 * The core purpose of a {@code Block} is to perform an operation on the inputs, and return an
 * output. It is defined in the {@link #forward(ParameterStore, NDList, boolean) forward} method.
 * The forward function could be defined explicitly in terms of parameters or implicitly and could
 * be a combination of the functions of the child blocks.
 *
 * 
The parameters of a {@code Block} are instances of {@link Parameter} which are required for
 * the operation in the forward function. For example, in a {@link Conv2d} block, the parameters are
 * {@code weight} and {@code bias}. During training, these parameters are updated to reflect the
 * training data, and that forms the crux of learning.
 *
 * 
When building these block functions, the easiest way is to use composition. Similar to how
 * functions are built by calling other functions, blocks can be built by combining other blocks. We
 * refer to the containing block as the parent and the sub-blocks as the children.
 *
 * 
We provide helpers for creating two common structures of blocks. For blocks that call children
 * in a chain, use {@link SequentialBlock}. If a blocks calls all of the children in parallel and
 * then combines their results, use {@link ParallelBlock}. For blocks that do not fit these
 * structures, you should directly extend the {@link AbstractBlock} class.
 *
 * 
A block does not necessarily have to have children and parameters. For example, {@link
 * SequentialBlock}, and {@link ParallelBlock} don't have any parameters, but do have child blocks.
 * Similarly, {@link Conv2d} does not have children, but has parameters. There can be special cases
 * where blocks have neither parameters nor children. One such example is {@link LambdaBlock}.
 * {@link LambdaBlock} takes in a function, and applies that function to its input in the {@link
 * #forward(ParameterStore, NDList, boolean) forward} method.
 *
 * 
Now that we understand the components of the block, we can explore what the block really
 * represents. A block combined with the recursive, hierarchical structure of its children forms a
 * network. It takes in the input to the network, performs its operation, and returns the output of
 * the network. When a block is added as a child of another block, it becomes a sub-network of that
 * block.
 *
 * 
The life-cycle of a block has 3 stages:
 *
 * 

 *   Construction
 *   
Initialization
 *   
Training
 * 
 *
 * Construction is the process of building the network. During this stage, blocks are created
 * with appropriate arguments and the desired network is built by adding creating a hierarchy of
 * parent and child blocks. At this stage, it is a bare-bones network. The parameter values are not
 * created and the shapes of the inputs are not known. The block is ready for initialization.
 *
 * 
Initialization is the process of initializing all the parameters of the block and its
 * children, according to the inputs expected. It involves setting an {@link Initializer}, deciding
 * the {@link DataType}, and the shapes of the input. The parameter arrays are {@link
 * ai.djl.ndarray.NDArray} that are initialized according to the {@link Initializer} set. At this
 * stage, the block is expecting a specific type of input, and is ready to be trained.
 *
 * 
Training is when we are starting feeding the training data as input to the block, get the
 * output, and try to update parameters to learn. For more information about training, please refer
 * the javadoc at {@link ai.djl.training.Trainer}. At the end of training, a block represents a
 * fully-trained model.
 *
 * 
It is also possible to freeze parameters and blocks to avoid them being trained. When loading
 * models or building blocks with preTrained data, they default to being frozen. If you wish to
 * further refine these elements, use {@link Block#freezeParameters(boolean)} to unfreeze them.
 *
 * @see this
 *     tutorial on creating your first network
 * @see The
 *     D2L chapter on blocks and blocks with
 *     direct parameters
 */
public interface Block {

    /**
     * Applies the operating function of the block once. This method should be called only on blocks
     * that are initialized.
     *
     * @param parameterStore the parameter store
     * @param inputs the input NDList
     * @param training true for a training forward pass (turn on dropout and layerNorm)
     * @return the output of the forward pass
     */
    default NDList forward(ParameterStore parameterStore, NDList inputs, boolean training) {
        return forward(parameterStore, inputs, training, null);
    }

    /**
     * Applies the operating function of the block once. This method should be called only on blocks
     * that are initialized.
     *
     * @param parameterStore the parameter store
     * @param inputs the input NDList
     * @param training true for a training forward pass (turn on dropout and layerNorm)
     * @param params optional parameters
     * @return the output of the forward pass
     */
    NDList forward(
            ParameterStore parameterStore,
            NDList inputs,
            boolean training,
            PairList params);

    /**
     * A forward call using both training data and labels.
     *
     * Within this forward call, it can be assumed that training is true.
     *
     * @param parameterStore the parameter store
     * @param data the input data NDList
     * @param labels the input labels NDList
     * @param params optional parameters
     * @return the output of the forward pass
     * @see #forward(ParameterStore, NDList, boolean, PairList)
     */
    default NDList forward(
            ParameterStore parameterStore,
            NDList data,
            NDList labels,
            PairList params) {
        return forward(parameterStore, data, true, params);
    }

    /**
     * Sets an {@link Initializer} to all the parameters that match parameter type in the block.
     *
     * @param initializer the initializer to set
     * @param type the Parameter Type we want to setInitializer
     */
    void setInitializer(Initializer initializer, Parameter.Type type);

    /**
     * Sets an {@link Initializer} to the specified direct parameter of the block, overriding the
     * initializer of the parameter, if already set.
     *
     * @param initializer the initializer to be set
     * @param paramName the name of the parameter
     */
    void setInitializer(Initializer initializer, String paramName);

    /**
     * Sets an {@link Initializer} to all the parameters that match Predicate in the block.
     *
     * @param initializer the initializer to be set
     * @param predicate predicate function to indicate parameters you want to set
     */
    void setInitializer(Initializer initializer, Predicate predicate);

    /**
     * Initializes the parameters of the block, set require gradient if required and infer the block
     * inputShape. This method must be called before calling `forward`.
     *
     * @param manager the NDManager to initialize the parameters
     * @param dataType the datatype of the parameters
     * @param inputShapes the shapes of the inputs to the block
     */
    void initialize(NDManager manager, DataType dataType, Shape... inputShapes);

    /**
     * Returns a boolean whether the block is initialized (block has inputShape and params have
     * nonNull array).
     *
     * @return whether the block is initialized
     */
    boolean isInitialized();

    /**
     * Guaranteed to throw an exception. Not yet implemented
     *
     * @param dataType the data type to cast to
     * @throws UnsupportedOperationException always
     */
    void cast(DataType dataType);

    /**
     * Closes all the parameters of the block. All the updates made during training will be lost.
     */
    void clear();

    /**
     * Returns a {@link PairList} of input names, and shapes.
     *
     * @return the {@link PairList} of input names, and shapes
     */
    PairList describeInput();

    /**
     * Returns a list of all the children of the block.
     *
     * @return the list of child blocks
     */
    BlockList getChildren();

    /**
     * Returns a list of all the direct parameters of the block.
     *
     * @return the list of {@link Parameter}
     */
    ParameterList getDirectParameters();

    /**
     * Returns a list of all the parameters of the block, including the parameters of its children
     * fetched recursively.
     *
     * @return the list of all parameters of the block
     */
    ParameterList getParameters();

    /**
     * Returns the expected output shapes of the block for the specified input shapes.
     *
     * @param inputShapes the shapes of the inputs
     * @return the expected output shapes of the block
     */
    Shape[] getOutputShapes(Shape[] inputShapes);

    /**
     * Returns the expected output shapes of the block for the specified input shapes.
     *
     * @param inputShapes the shapes of the inputs
     * @param inputDataTypes the datatypes of the inputs
     * @return the expected output shapes of the block
     */
    default Shape[] getOutputShapes(Shape[] inputShapes, DataType[] inputDataTypes) {
        return getOutputShapes(inputShapes);
    }

    /**
     * Returns the input shapes of the block. The input shapes are only available after the block is
     * initialized, otherwise an {@link IllegalStateException} is thrown.
     *
     * @return the input shapes of the block
     */
    Shape[] getInputShapes();

    /**
     * Returns the input dataTypes of the block.
     *
     * @return the input dataTypes of the block
     */
    DataType[] getOutputDataTypes();

    /**
     * Writes the parameters of the block to the given outputStream.
     *
     * @param os the outputstream to save the parameters to
     * @throws IOException if an I/O error occurs
     */
    void saveParameters(DataOutputStream os) throws IOException;

    /**
     * Loads the parameters from the given input stream.
     *
     * @param manager an NDManager to create the parameter arrays
     * @param is the inputstream that stream the parameter values
     * @throws IOException if an I/O error occurs
     * @throws MalformedModelException if the model file is corrupted or unsupported
     */
    void loadParameters(NDManager manager, DataInputStream is)
            throws IOException, MalformedModelException;

    /**
     * Freezes or unfreezes all parameters inside the block for training.
     *
     * @param freeze true if the parameter should be frozen
     * @see Parameter#freeze(boolean)
     */
    default void freezeParameters(boolean freeze) {
        for (Parameter parameter : getParameters().values()) {
            parameter.freeze(freeze);
        }
    }

    /**
     * Freezes or unfreezes all parameters inside the block that pass the predicate.
     *
     * @param freeze true to mark as frozen rather than unfrozen
     * @param pred true tests if the parameter should be updated
     * @see Parameter#freeze(boolean)
     */
    default void freezeParameters(boolean freeze, Predicate pred) {
        for (Parameter parameter : getParameters().values()) {
            if (pred.test(parameter)) {
                parameter.freeze(freeze);
            }
        }
    }

    /**
     * Validates that actual layout matches the expected layout.
     *
     * @param expectedLayout the expected layout
     * @param actualLayout the actual Layout
     * @throws UnsupportedOperationException if the actual layout does not match the expected layout
     */
    static void validateLayout(LayoutType[] expectedLayout, LayoutType[] actualLayout) {
        if (actualLayout.length != expectedLayout.length) {
            throw new UnsupportedOperationException(
                    "Expected layout: "
                            + LayoutType.toString(expectedLayout)
                            + ", but got: "
                            + LayoutType.toString(actualLayout));
        }
        for (int i = 0; i < actualLayout.length; i++) {
            if (actualLayout[i] != LayoutType.UNKNOWN && actualLayout[i] != expectedLayout[i]) {
                throw new UnsupportedOperationException(
                        "Expected layout: "
                                + LayoutType.toString(expectedLayout)
                                + ", but got: "
                                + LayoutType.toString(actualLayout));
            }
        }
    }
}