Many resources are needed to download a project. Please understand that we have to compensate our server costs. Thank you in advance. Project price only 1 $
You can buy this project and download/modify it how often you want.
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.crunch;
import com.google.common.base.Preconditions;
import com.google.common.collect.ImmutableMap;
import com.google.common.collect.Iterables;
import com.google.common.collect.Maps;
import org.apache.hadoop.conf.Configuration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Map;
import java.util.concurrent.Callable;
/**
* A specialization of {@code Callable} that executes some sequential logic on the client machine as
* part of an overall Crunch pipeline in order to generate zero or more outputs, some of
* which may be {@code PCollection} instances that are processed by other jobs in the
* pipeline.
*
*
{@code PipelineCallable} is intended to be used to inject auxiliary logic into the control
* flow of a Crunch pipeline. This can be used for a number of purposes, such as importing or
* exporting data into a cluster using Apache Sqoop, executing a legacy MapReduce job
* or Pig/Hive script within a Crunch pipeline, or sending emails or status notifications
* about the status of a long-running pipeline during its execution.
*
*
The Crunch planner needs to know three things about a {@code PipelineCallable} instance in order
* to manage it:
*
*
The {@code Target} and {@code PCollection} instances that must have been materialized
* before this instance is allowed to run. This information should be specified via the {@code dependsOn}
* methods of the class.
*
What Outputs will be created after this instance is executed, if any. These outputs may be
* new {@code PCollection} instances that are used as inputs in other Crunch jobs. These outputs should
* be specified by the {@code getOutput(Pipeline)} method of the class, which will be executed immediately
* after this instance is registered with the {@link Pipeline#sequentialDo} method.
*
The actual logic to execute when the dependent Targets and PCollections have been created in
* order to materialize the output data. This is defined in the {@code call} method of the class.
*
*
*
*
If a given PipelineCallable does not have any dependencies, it will be executed before any jobs are run
* by the planner. After that, the planner will keep track of when the dependencies of a given instance
* have been materialized, and then execute the instance as soon as they all exist. The Crunch planner
* uses a thread pool executor to run multiple {@code PipelineCallable} instances simultaneously, but you can
* indicate that an instance should be run by itself by overriding the {@code boolean runSingleThreaded()} method
* below to return true.
*
*
The {@code call} method returns a {@code Status} to indicate whether it succeeded or failed. A failed
* instance, or any exceptions/errors thrown by the call method, will cause the overall Crunch pipeline containing
* this instance to fail.
*
*
A number of helper methods for accessing the dependent Target/PCollection instances that this instance
* needs to exist, as well as the {@code Configuration} instance for the overall Pipeline execution, are available
* as protected methods in this class so that they may be accessed from implementations of {@code PipelineCallable}
* within the {@code call} method.
*