com.marklogic.client.datamovement.package-info Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of marklogic-client-api Show documentation
The official MarkLogic Java client API.
The newest version!
/**
 * The MarkLogic Data Movement SDK supports long-running write, read,
 * delete, or transform jobs.  Long-running write jobs are enabled by {@link
 * com.marklogic.client.datamovement.WriteBatcher}.
 * Long-running read, delete, or transform jobs are enabled by {@link
 * com.marklogic.client.datamovement.QueryBatcher} which can perform actions
 * {@link com.marklogic.client.datamovement.DataMovementManager#newQueryBatcher(
 * com.marklogic.client.query.StructuredQueryDefinition) on all uris matching a query} or
 * {@link com.marklogic.client.datamovement.DataMovementManager#newQueryBatcher(
 * java.util.Iterator) on all uris provided by an Iterator<String>}.
 *
 * Features:
 *
 * 
 *   WriteBatcher
 *   
 *     batches documents for
 *       bulk write
 *       but improves on performance because it
 *     
 *         writes with many parallel threads
 *         
writes round-robin to all hosts in the cluster with forests for
 *           the specified database
 *     
 *     
one instance safely receives calls to {@link
 *       com.marklogic.client.datamovement.WriteBatcher#add add} from many
 *       threads
 *     
supports transforms, metadata, and temporal collections
 *   
 *   
QueryBatcher
 *   
 *     offers high-performance import from sources not supported by
 *       mlcp
 *     
runs provided code on a set of uris (common use cases include but
 *         are not limited to export, delete, and transform)
 *     
provided code can leverage the full feature set of the Java Client API
 *     
uris usually are matches to a query
 *     
for corner cases uris can be provided by an Iterator<String>
 *     
paginates through query matches for best scalability
 *     
paginates with many threads for increased throughput
 *     
directly queries each host in the cluster with forests for
 *         the specified database
 *   
 * 
 *
 * 
 * Using Provided Listeners
 *
 * When using QueryBatcher, your custom listeners provided to {@link
 * com.marklogic.client.datamovement.QueryBatcher#onUrisReady onUrisReady} can do
 * anything with each batch of uris and will usually use the
 * MarkLogic Java Client API
 * to do things. However, to simplify common use cases, the
 * following listeners are also provided:
 *
 *  *   {@link com.marklogic.client.datamovement.ApplyTransformListener}  - Modifies documents in-place in the database by applying a {@link com.marklogic.client.document.ServerTransform server-side transform}
 *   {@link com.marklogic.client.datamovement.ExportListener}          - Downloads each document for further processing in Java
 *   {@link com.marklogic.client.datamovement.ExportToWriterListener}  - Downloads each document and writes it to a Writer (could be a file, HTTP response, in-memory Writer, etc.
 *   {@link com.marklogic.client.datamovement.DeleteListener}          - Deletes each batch of documents from the server
 *   {@link com.marklogic.client.datamovement.UrisToWriterListener}    - Writes each uri to a Writer (could be a file, HTTP response, etc.).
 * 
 *
 *
 * Using QueryBatcher
 *
 * When you need to perform actions on server documents beyond what can be
 * done with the provided listeners, register your
 * custom code with onUrisReady and your code will be run for each batch of
 * uris.
 *
 * For Example:
 * {@code
 *     QueryBatcher qhb = dataMovementManager.newQueryBatcher(query)
 *         .withBatchSize(1000)
 *         .withThreadCount(20)
           .withConsistentSnapshot()
 *         .onUrisReady(batch -> {
 *             for ( String uri : batch.getItems() ) {
 *                 if ( uri.endsWith(".txt") ) {
 *                     client.newDocumentManager().delete(uri);
 *                 }
 *             }
 *         })
 *         .onQueryFailure(queryBatchException -> queryBatchException.printStackTrace());
 *     JobTicket ticket = dataMovementManager.startJob(qhb);
 *     qhb.awaitCompletion();
 *     dataMovementManager.stopJob(ticket);
 *}
 *
 *
 * Using WriteBatcher
 *
 * When you need to write a very large volume of documents and
 * mlcp
 * cannot meet your requirements, use WriteBatcher.
 *
 * For Example:
 * {@code
 *     WriteBatcher whb = dataMovementManager.newWriteBatcher()
 *         .withBatchSize(100)
 *         .withThreadCount(20)
 *         .onBatchSuccess(batch -> {
 *             logger.debug("batch # {}, so far: {}", batch.getJobBatchNumber(), batch.getJobResultsSoFar());
 *         })
 *         .onBatchFailure((batch,throwable) -> throwable.printStackTrace() );
 *     JobTicket ticket = dataMovementManager.startJob(whb);
 *     // the add or addAs methods could be called in separate threads on the
 *     // single whb instance
 *     whb.add  ("doc1.txt", new StringHandle("doc1 contents"));
 *     whb.addAs("doc2.txt", "doc2 contents");
 *
 *     whb.flushAndWait(); // send the two docs even though they're not a full batch
 *     dataMovementManager.stopJob(ticket);
 *}
 *
 *
 * 
 * Writing Custom Listeners
 *
 * As demonstrated above, listeners should be added to each instance of
 * QueryBatcher or WriteBatcher.  Ad-hoc listeners can be written as Java 8
 * lambda expressions.  More sophisticated custom listeners can implement the
 * appropriate listener interface or extend one of the
 * provided listeners listed above.
 *
 * QueryBatchListener (onUrisReady) instances are necessary to do something
 * with the uris fetched by QueryBatcher.  What a custom QueryBatchListener
 * does is completely up to it, but any operation which operates on uris
 * offered by any part of the Java Client API could be used, as could any read
 * or write to an external system.  QueryFailureListener (onQueryFailure)
 * instances handle any exceptions encoutnered fetching the uris.
 * WriteBatchListener (onBatchSuccess) instances handle any custom tracking
 * requirements during a WriteBatcher job.  WriteFailureListener
 * (onBatchFailure) instances handle any exceptions encountered writing the
 * batches formed from docs send to the WriteBatcher instance.  See the
 * javadocs for each provided listener for an explantion of the
 * various listeners that can be registered for it to call.  See javadocs, the
 * Java Application Developer's Guide,
 * source code for provided listeners,
 * cookbook examples, and
 * unit tests
 * for more examples of listener implementation ideas.
 *
 *
 * Listners Must Be Thread-Safe
 *
 * Since listeners are called asynchronously by all threads in the pool inside
 * the QueryBatcher or WriteBatcher instance, they must only perform
 * thread-safe operations.  For example, accumulating to a collection should
 * only be done with collections wrapped as
 * {@link java.util.Collections#synchronizedCollection synchronized Collections}
 * rather than directly using un-synchronized collections such as HashMap or
 * ArrayList which are not thread-safe.  Similarly, accumulating to a string
 * should use StringBuffer insted of StringBuilder since StringBuffer is
 * synchronized (and thus thread-safe).  We also recommend {@link
 * java.util.concurrent.atomic java.util.concurrent.atomic classes}.
 *
 * Listeners should handle their own exceptions as described below in
 * Handling Exceptions in Listeners.
 *
 *
 * 
 * Handling Exceptions in Listeners
 *
 * Since listeners are called asynchrounously, external exception handling
 * cannot wrap the call in a try-catch block.  Instead, a listener can and
 * should handle its own exceptions by wrapping the calls in its body in a
 * try-catch block.  When any listener does not handle its own exceptions and
 * throws any exception (Throwable), the exception is logged at error level
 * with a call like:
 *
 * {@code
 *     logger.error("Exception thrown by an onBatchSuccess listener", throwable);
 *}
 *
 * This achieves logging of exceptions without allowing them to prevent the job
 * from continuing.
 *
 * A QueryFailureListener or WriteFailureListener will not be notified of
 * exceptions thrown by other listeners.  Instead, these failure listeners are
 * notified exclusively of exceptions in the operation of QueryBatcher or
 * WriteBatcher.
 *
 * If you wish a custom QueryBatchListener or WriteBatchListener to trap its
 * own exceptions and pass them along to callbacks registered with it for
 * exception handling, it can of course do that in a custom way.  Examples of
 * this pattern can be seen in the interface of
 * {@link com.marklogic.client.datamovement.ApplyTransformListener}.
 *
 * Pre-installed Listeners
 *
 * Every time you create a new QueryBatcher or WriteBatcher it comes with some
 * pre-installed listeners such as
 * {@link com.marklogic.client.datamovement.HostAvailabilityListener} and a
 * listener to track counts for JobReport.  If you wish to remove these
 * listeners and their associated functionality call one of the following:
 * {@link com.marklogic.client.datamovement.QueryBatcher#setUrisReadyListeners
 * setUrisReadyListeners}, {@link
 * com.marklogic.client.datamovement.QueryBatcher#setQueryFailureListeners
 * setQueryFailureListeners}, {@link
 * com.marklogic.client.datamovement.WriteBatcher#setBatchSuccessListeners
 * setBatchSuccessListeners}, or {@link
 * com.marklogic.client.datamovement.WriteBatcher#setBatchFailureListeners
 * setBatchFailureListeners}.  Obviously, removing the functionality of
 * HostAvailabilityListener means it won't do its job of handling black-listing
 * hosts or retrying batches that occur when a host is unavailable.  And
 * removing the functionality of the listeners that track counts for JobReport
 * means JobReport should no longer be used.  If you would just like to change
 * the settings on HostAvailabilityListener or NoResponseListener, you can do
 * something like the following:
 *
 * {@code
 *    HostAvailabilityListener.getInstance(batcher)
 *      .withSuspendTimeForHostUnavailable(Duration.ofMinutes(60))
 *      .withMinHosts(2);
 *}
 *
 *
 * Enable Logging
 *
 * We have made efforts to provide helpful logging as you use QueryBatcher and
 * WriteBatcher.  Please make sure to enable your slf4j-compliant
 * logging framework.
 *
 * 














 * 














 * 














 */
/*
 * Copyright © 2024 MarkLogic Corporation. All Rights Reserved.
 */
package com.marklogic.client.datamovement;