edu.byu.hbll.box.Harvester Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of box-core Show documentation

There is a newer version: 2.5.3

package edu.byu.hbll.box;

/**
 * Harvests documents or at least the document ids from a source system. Given an initial state, the
 * client will load documents from a source system paging through the results until an end is
 * reached. The client should return the resulting documents after each page.
 *
 * If paging is not possible and streaming is necessary, the harvester may use {@link
 * Source#save(java.util.Collection)} to stream in the resulting documents rather than returning
 * them in the {@link HarvestResult}.
 *
 * 
For Box's purposes, there will only be one instance of the harvester per source and only one
 * thread will call harvest at a time. So a harvester does not need to be thread safe. Also for
 * performance or other reasons, state may be maintained for subsequent calls of {@link
 * #harvest(HarvestContext)}. In order to pick back up where left off in the case of application
 * redeployments, a "cursor" will be saved to a database and offered back to the client letting the
 * client know where it left off.
 *
 * The harvester can also be responsible for just gathering ids to be processed by the processor
 * rather than documents to be saved. This is done by returning a list of unprocessed documents
 * where only the id is set.
 *
 * @author Charles Draper
 */
public interface Harvester extends BoxConfigurable {

  /**
   * Returns the next set or page of documents from the source system. The client knows what the
   * "next" set is by observing the cursor object inside the context and determining which documents
   * to return next. The cursor is an object that is created by the client and returned as part of
   * the {@link HarvestResult} after each set.
   *
   * @param context context informing the harvester what to do next
   * @return a result containing resulting documents and other information for Box
   */
  HarvestResult harvest(HarvestContext context);
}