.mlcp.9.0.13.8.source-code.overview.html Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of mlcp Show documentation
MarkLogic Content Pump
There is a newer version: 11.3.1

Overview of MarkLogic Connector for Hadoop</title</head>
<body>
<p>
  This bundle provides an API for a MarkLogic Server content
  connector for Apache Hadoop MapReduce. The overview covers the 
  following topics:
</p>
<ul>
  <li><a href="#Introduction">Introduction</a></li>
  <li><a href="#Configuration">Configuration</a></li>
</ul>

<p>
  For detailed information, see the <em>MarkLogic Connector for
  Hadoop Developer's Guide</em>.
</p>

<h2 id="Introduction">Introduction</h2>
<p>
  The MarkLogic Connector for Hadoop API allows you to
  use MarkLogic Server as either or both a Hadoop MapReduce
  input source and an output destination. 
</p>
<p>
  The following classes are provided for defining MarkLogic-specific
  key and value types for your MapReduce key-value pairs:
</p>
<ul>
  <li>{@link com.marklogic.mapreduce.NodePath} for keys</li>
  <li>{@link com.marklogic.mapreduce.DocumentURI} for keys</li>
  <li>{@link com.marklogic.mapreduce.MarkLogicNode} for values</li>
</ul>
<p>
  You may also use Apache Hadoop MapReduce types such as Text in
  certain circumstances. See {@link com.marklogic.mapreduce.ValueInputFormat}
  {@link com.marklogic.mapreduce.KeyValueInputFormat}.
</p>
<p>
  You may generate input data using MarkLogic Server lexicon functions
  by subclassing one of the lexicon function wrapper classes in 
  com.marklogic.mapreduce.functions. Use lexicon functions
  with {@link com.marklogic.mapreduce.ValueInputFormat} and 
  {@link com.marklogic.mapreduce.KeyValueInputFormat}.
</p>
<p>
  The following classes are provided for defining 
  MarkLogic-specific MapReduce input and output formats. 
  Input and output formats need not be the same type.
</p>
<ul>
  <li>{@link com.marklogic.mapreduce.DocumentInputFormat}</li>
  <li>{@link com.marklogic.mapreduce.NodeInputFormat}</li>
  <li>{@link com.marklogic.mapreduce.ValueInputFormat}</li>
  <li>{@link com.marklogic.mapreduce.KeyValueInputFormat}</li>
  <li>{@link com.marklogic.mapreduce.ContentOutputFormat}</li>
  <li>{@link com.marklogic.mapreduce.NodeOutputFormat}</li>
  <li>{@link com.marklogic.mapreduce.PropertyOutputFormat}</li>
</ul>

<h2 id="Configuration">Configuration</h2>
<p>
  Configure the connector using the standard Hadoop configuration
  mechanism. That is, use a Hadoop configuration file to define
  property values, or set properties programmatically on your
  Job's {@link org.apache.hadoop.conf.Configuration} object.
</p>
<p>
  The configuration properties available for the connector are
  described in {@link com.marklogic.mapreduce.MarkLogicConstants}.
</p>
<p>
 When using MarkLogic Server as an input source for MapReduce
 tasks, you may use either basic or advanced input mode. The default 
 is <code>basic</code> mode. The mode is controlled through
 the {@link com.marklogic.mapreduce.MarkLogicConstants#INPUT_MODE
 mapreduce.marklogic.input.mode} property. The following sections
 describe the input modes briefly. For details, see the
 <em>MarkLogic Connector for Hadoop Developer's Guide</em>.
</p>

<h3>Configuring the Input Query With a Path Expression</h3>
<p>
 In basic mode, you may supply components of an XQuery path expression
 which the connector uses to generate input data. You may not use this
 option along with a lexicon function class.
</p>
<p>To allow MarkLogic Server to optimize the input query, the path 
 expression is constructed from two components: A 
 {@link com.marklogic.mapreduce.MarkLogicConstants#DOCUMENT_SELECTOR 
 document node selector} and a
 {@link com.marklogic.mapreduce.MarkLogicConstants#SUBDOCUMENT_EXPRESSION
 sub-document expression}.
</p>
<p>
  The input split is not configurable in <code>basic</code> mode. The
  splits are based on a rough count of the number of fragments in
  each forest. Use <code>advanced</code> input mode for more control
  over input split generation.
</p>
<p>
 Conceptually, the input data for each task is constructed from a 
 path expression similar to:
</p>
<pre class="codesample"><code>
$document-selector/$subdocument-expression
</code></pre>
<p>
 Both components of the input path expression are optional. If no 
 document selector is given, <code>fn:collection()</code> is used.
 If no subdocument expression is given, the document nodes returned
 by the document selector are used as the input values.
</p>
<p>Examples:</p>
<pre class="codesample"><code>
document selector: none
subdocument expression: none
  => All document nodes in fn:collection()

document selector: fn:collection("wiki-topics")
subdocument expression: none
  => All document nodes in the "wiki-topics" collection

document selector: fn:collection("wiki-topics")
subdocument expression: //wp:a[@href]
  => All elements in the "wiki-topics" collection containing hrefs

document selector: fn:collection("wiki-topics")
subdocument expression: //wp:a[@href]/@title
  => The titles of all documents in the "wiki-topics" collection 
     containing hrefs
</code></pre>

<h3>Configuring the Input Query with a Lexicon Function</h3>
<p>
 In basic mode, you may gather input data using a MarkLogicServer
 lexicon function. This option may not be used with the XPath
 based configuration properties described above. If both are
 configured for a job, the lexicon function takes precedence.
</p>
<p>
 To use a lexicon function for input, implement a subclass of
 one of the lexicon wrapper functions in com.marklogic.mapreduce.functions.
 For example, to use <code>cts:element-values</code>, implement a
 subclass of {@link com.marklogic.mapreduce.functions.ElementValues}.
 Override the methods corresponding to the function parameter value
 you want to include in the call.
</p>
<p>
 For details, see "Using a Lexicon to Generate Key-Value Pairs" in
 the <em>MarkLogic Connector for Hadoop Developer's Guide</em>.
</p>

<h3>Configuring the Input Query in Advanced Mode</h3>
<p>
 In <code>advanced</code> input mode, you must supply an 
 {@link com.marklogic.mapreduce.MarkLogicConstants#SPLIT_QUERY 
 input split query} and an 
 {@link com.marklogic.mapreduce.MarkLogicConstants#INPUT_QUERY
 input query}.
 </p>
 <p>
  The split query is used to generate meta-data for Hadoop's
  input splits. This query must return a sequence of triples, 
  each of which includes a forest id, record (fragment) count, 
  and list of host names. The count may be an estimate.
</p>
<p>
 The input query is used to fetch the input data for each map task.
 This query must return data that matches the configured InputFormat
 subclass.
</p>
</body>
</html>

</code></pre>    <br/>
    <br/>
    <div id="right-banner">
            </div>
    <div id="left-banner">
            </div>
<div class='clear'></div>
    <aside class="related-items">
        <section>
            <div class="panel panel-primary">
                <div class="panel-heading margin-bottom">Related Artifacts</div>
                <div class="">
                    <a title='This artifact is from the group mysql' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/mysql/mysql-connector-java' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> mysql-connector-java <small class='group-info' >mysql</small></a><br/><a title='This artifact is from the group com.github.codedrinker' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.github.codedrinker/facebook-messenger' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> facebook-messenger <small class='group-info' >com.github.codedrinker</small></a><br/><a title='This artifact is from the group org.seleniumhq.selenium' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.seleniumhq.selenium/selenium-java' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> selenium-java <small class='group-info' >org.seleniumhq.selenium</small></a><br/><a title='This artifact is from the group com.github.sola92' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.github.sola92/instagram-java' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> instagram-java <small class='group-info' >com.github.sola92</small></a><br/><a title='This artifact is from the group com.google.code.gson' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.google.code.gson/gson' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> gson <small class='group-info' >com.google.code.gson</small></a><br/><a title='This artifact is from the group org.apache.poi' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.poi/poi' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> poi <small class='group-info' >org.apache.poi</small></a><br/><a title='This artifact is from the group org.apache.httpcomponents' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.httpcomponents/httpclient' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> httpclient <small class='group-info' >org.apache.httpcomponents</small></a><br/><a title='This artifact is from the group org.json' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.json/json' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> json <small class='group-info' >org.json</small></a><br/><a title='This artifact is from the group com.google.code.facebook-java-api' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.google.code.facebook-java-api/facebook-java-api' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> facebook-java-api <small class='group-info' >com.google.code.facebook-java-api</small></a><br/><a title='This artifact is from the group org.apache.poi' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.poi/poi-ooxml' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> poi-ooxml <small class='group-info' >org.apache.poi</small></a><br/><a title='This artifact is from the group com.fasterxml.jackson.core' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.fasterxml.jackson.core/jackson-databind' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> jackson-databind <small class='group-info' >com.fasterxml.jackson.core</small></a><br/><a title='This artifact is from the group junit' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/junit/junit' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> junit <small class='group-info' >junit</small></a><br/><a title='This artifact is from the group org.primefaces' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.primefaces/primefaces' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> primefaces <small class='group-info' >org.primefaces</small></a><br/><a title='This artifact is from the group com.github.noraui' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.github.noraui/ojdbc7' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> ojdbc7 <small class='group-info' >com.github.noraui</small></a><br/><a title='This artifact is from the group com.jfoenix' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.jfoenix/jfoenix' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> jfoenix <small class='group-info' >com.jfoenix</small></a><br/><a title='This artifact is from the group org.testng' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.testng/testng' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> testng <small class='group-info' >org.testng</small></a><br/><a title='This artifact is from the group com.googlecode.json-simple' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.googlecode.json-simple/json-simple' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> json-simple <small class='group-info' >com.googlecode.json-simple</small></a><br/><a title='This artifact is from the group org.seleniumhq.selenium' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.seleniumhq.selenium/selenium-server' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> selenium-server <small class='group-info' >org.seleniumhq.selenium</small></a><br/><a title='This artifact is from the group com.itextpdf' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.itextpdf/itextpdf' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> itextpdf <small class='group-info' >com.itextpdf</small></a><br/><a title='This artifact is from the group org.springframework' class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.springframework/spring-core' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> spring-core <small class='group-info' >org.springframework</small></a><br/>                </div>
            </div>
        </section>
        <section>
            <div class="panel panel-primary">
                <div class="panel-heading margin-bottom">Related Groups</div>
                <div class="">
                    <a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.springframework' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.springframework</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.poi' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.apache.poi</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.hibernate' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.hibernate</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.springframework.boot' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.springframework.boot</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.fasterxml.jackson.core' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> com.fasterxml.jackson.core</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.itextpdf' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> com.itextpdf</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.seleniumhq.selenium' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.seleniumhq.selenium</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/mysql' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> mysql</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.finos.legend.engine' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.finos.legend.engine</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.httpcomponents' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.apache.httpcomponents</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.logging.log4j' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.apache.logging.log4j</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.openjfx' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.openjfx</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.apache.commons' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.apache.commons</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/org.json' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> org.json</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.google.guava' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> com.google.guava</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.google.zxing' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> com.google.zxing</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/net.sf.jasperreports' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> net.sf.jasperreports</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/javax.xml.bind' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> javax.xml.bind</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/ojdbc' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> ojdbc</a><br/><a class='btn btn-default btn-xs small-margin-bottom ellipsis sidebar-btn' href='/artifacts/com.google.code.facebook-java-api' ><i class="fa fa-arrow-circle-right" aria-hidden="true"></i> com.google.code.facebook-java-api</a><br/>                </div>
            </div>
        </section>
    </aside>
    <div class='clear'></div>
</main>
</div>
<br/><br/>
    <div class="align-center">© 2015 - 2024 <a href="/legal-notice.php">Weber Informatics LLC</a> | <a href="/data-protection.php">Privacy Policy</a></div>
<br/><br/><br/><br/><br/><br/>
</body>
</html>