All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.marklogic.client.datamovement.package-info Maven / Gradle / Ivy

The newest version!
/**
 * 

The MarkLogic Data Movement SDK supports long-running write, read, * delete, or transform jobs. Long-running write jobs are enabled by {@link * com.marklogic.client.datamovement.WriteBatcher}. * Long-running read, delete, or transform jobs are enabled by {@link * com.marklogic.client.datamovement.QueryBatcher} which can perform actions * {@link com.marklogic.client.datamovement.DataMovementManager#newQueryBatcher( * com.marklogic.client.query.StructuredQueryDefinition) on all uris matching a query} or * {@link com.marklogic.client.datamovement.DataMovementManager#newQueryBatcher( * java.util.Iterator) on all uris provided by an Iterator<String>}.

* * Features: * *
    *
  • WriteBatcher *
      *
    • batches documents for * bulk write * but improves on performance because it *
        *
      • writes with many parallel threads *
      • writes round-robin to all hosts in the cluster with forests for * the specified database *
      *
    • one instance safely receives calls to {@link * com.marklogic.client.datamovement.WriteBatcher#add add} from many * threads *
    • supports transforms, metadata, and temporal collections *
    *
  • QueryBatcher *
      *
    • offers high-performance import from sources not supported by * mlcp *
    • runs provided code on a set of uris (common use cases include but * are not limited to export, delete, and transform) *
    • provided code can leverage the full feature set of the Java Client API *
    • uris usually are matches to a query *
    • for corner cases uris can be provided by an Iterator<String> *
    • paginates through query matches for best scalability *
    • paginates with many threads for increased throughput *
    • directly queries each host in the cluster with forests for * the specified database *
    *
* * *

Using Provided Listeners

* *

When using QueryBatcher, your custom listeners provided to {@link * com.marklogic.client.datamovement.QueryBatcher#onUrisReady onUrisReady} can do * anything with each batch of uris and will usually use the * MarkLogic Java Client API * to do things. However, to simplify common use cases, the * following listeners are also provided:

* *
 *   {@link com.marklogic.client.datamovement.ApplyTransformListener}  - Modifies documents in-place in the database by applying a {@link com.marklogic.client.document.ServerTransform server-side transform}
 *   {@link com.marklogic.client.datamovement.ExportListener}          - Downloads each document for further processing in Java
 *   {@link com.marklogic.client.datamovement.ExportToWriterListener}  - Downloads each document and writes it to a Writer (could be a file, HTTP response, in-memory Writer, etc.
 *   {@link com.marklogic.client.datamovement.DeleteListener}          - Deletes each batch of documents from the server
 *   {@link com.marklogic.client.datamovement.UrisToWriterListener}    - Writes each uri to a Writer (could be a file, HTTP response, etc.).
 * 
* * *

Using QueryBatcher

* *

When you need to perform actions on server documents beyond what can be * done with the provided listeners, register your * custom code with onUrisReady and your code will be run for each batch of * uris.

* * For Example: *
{@code
 *     QueryBatcher qhb = dataMovementManager.newQueryBatcher(query)
 *         .withBatchSize(1000)
 *         .withThreadCount(20)
           .withConsistentSnapshot()
 *         .onUrisReady(batch -> {
 *             for ( String uri : batch.getItems() ) {
 *                 if ( uri.endsWith(".txt") ) {
 *                     client.newDocumentManager().delete(uri);
 *                 }
 *             }
 *         })
 *         .onQueryFailure(queryBatchException -> queryBatchException.printStackTrace());
 *     JobTicket ticket = dataMovementManager.startJob(qhb);
 *     qhb.awaitCompletion();
 *     dataMovementManager.stopJob(ticket);
 *}
* * *

Using WriteBatcher

* *

When you need to write a very large volume of documents and * mlcp * cannot meet your requirements, use WriteBatcher.

* * For Example: *
{@code
 *     WriteBatcher whb = dataMovementManager.newWriteBatcher()
 *         .withBatchSize(100)
 *         .withThreadCount(20)
 *         .onBatchSuccess(batch -> {
 *             logger.debug("batch # {}, so far: {}", batch.getJobBatchNumber(), batch.getJobResultsSoFar());
 *         })
 *         .onBatchFailure((batch,throwable) -> throwable.printStackTrace() );
 *     JobTicket ticket = dataMovementManager.startJob(whb);
 *     // the add or addAs methods could be called in separate threads on the
 *     // single whb instance
 *     whb.add  ("doc1.txt", new StringHandle("doc1 contents"));
 *     whb.addAs("doc2.txt", "doc2 contents");
 *
 *     whb.flushAndWait(); // send the two docs even though they're not a full batch
 *     dataMovementManager.stopJob(ticket);
 *}
* * * *

Writing Custom Listeners

* *

As demonstrated above, listeners should be added to each instance of * QueryBatcher or WriteBatcher. Ad-hoc listeners can be written as Java 8 * lambda expressions. More sophisticated custom listeners can implement the * appropriate listener interface or extend one of the * provided listeners listed above.

* *

QueryBatchListener (onUrisReady) instances are necessary to do something * with the uris fetched by QueryBatcher. What a custom QueryBatchListener * does is completely up to it, but any operation which operates on uris * offered by any part of the Java Client API could be used, as could any read * or write to an external system. QueryFailureListener (onQueryFailure) * instances handle any exceptions encoutnered fetching the uris. * WriteBatchListener (onBatchSuccess) instances handle any custom tracking * requirements during a WriteBatcher job. WriteFailureListener * (onBatchFailure) instances handle any exceptions encountered writing the * batches formed from docs send to the WriteBatcher instance. See the * javadocs for each provided listener for an explantion of the * various listeners that can be registered for it to call. See javadocs, the * Java Application Developer's Guide, * source code for provided listeners, * cookbook examples, and * unit tests * for more examples of listener implementation ideas.

* * *

Listners Must Be Thread-Safe

* *

Since listeners are called asynchronously by all threads in the pool inside * the QueryBatcher or WriteBatcher instance, they must only perform * thread-safe operations. For example, accumulating to a collection should * only be done with collections wrapped as * {@link java.util.Collections#synchronizedCollection synchronized Collections} * rather than directly using un-synchronized collections such as HashMap or * ArrayList which are not thread-safe. Similarly, accumulating to a string * should use StringBuffer insted of StringBuilder since StringBuffer is * synchronized (and thus thread-safe). We also recommend {@link * java.util.concurrent.atomic java.util.concurrent.atomic classes}.

* *

Listeners should handle their own exceptions as described below in * Handling Exceptions in Listeners.

* * * *

Handling Exceptions in Listeners

* * Since listeners are called asynchrounously, external exception handling * cannot wrap the call in a try-catch block. Instead, a listener can and * should handle its own exceptions by wrapping the calls in its body in a * try-catch block. When any listener does not handle its own exceptions and * throws any exception (Throwable), the exception is logged at error level * with a call like: * *
{@code
 *     logger.error("Exception thrown by an onBatchSuccess listener", throwable);
 *}
* *

This achieves logging of exceptions without allowing them to prevent the job * from continuing.

* *

A QueryFailureListener or WriteFailureListener will not be notified of * exceptions thrown by other listeners. Instead, these failure listeners are * notified exclusively of exceptions in the operation of QueryBatcher or * WriteBatcher.

* *

If you wish a custom QueryBatchListener or WriteBatchListener to trap its * own exceptions and pass them along to callbacks registered with it for * exception handling, it can of course do that in a custom way. Examples of * this pattern can be seen in the interface of * {@link com.marklogic.client.datamovement.ApplyTransformListener}.

* *

Pre-installed Listeners

* *

Every time you create a new QueryBatcher or WriteBatcher it comes with some * pre-installed listeners such as * {@link com.marklogic.client.datamovement.HostAvailabilityListener} and a * listener to track counts for JobReport. If you wish to remove these * listeners and their associated functionality call one of the following: * {@link com.marklogic.client.datamovement.QueryBatcher#setUrisReadyListeners * setUrisReadyListeners}, {@link * com.marklogic.client.datamovement.QueryBatcher#setQueryFailureListeners * setQueryFailureListeners}, {@link * com.marklogic.client.datamovement.WriteBatcher#setBatchSuccessListeners * setBatchSuccessListeners}, or {@link * com.marklogic.client.datamovement.WriteBatcher#setBatchFailureListeners * setBatchFailureListeners}. Obviously, removing the functionality of * HostAvailabilityListener means it won't do its job of handling black-listing * hosts or retrying batches that occur when a host is unavailable. And * removing the functionality of the listeners that track counts for JobReport * means JobReport should no longer be used. If you would just like to change * the settings on HostAvailabilityListener or NoResponseListener, you can do * something like the following:

* *
{@code
 *    HostAvailabilityListener.getInstance(batcher)
 *      .withSuspendTimeForHostUnavailable(Duration.ofMinutes(60))
 *      .withMinHosts(2);
 *}
* * *

Enable Logging

* *

We have made efforts to provide helpful logging as you use QueryBatcher and * WriteBatcher. Please make sure to enable your slf4j-compliant * logging framework.

* *













*













*













*/ /* * Copyright © 2024 MarkLogic Corporation. All Rights Reserved. */ package com.marklogic.client.datamovement;




© 2015 - 2025 Weber Informatics LLC | Privacy Policy