edu.byu.hbll.box.Harvester Maven / Gradle / Ivy
package edu.byu.hbll.box;
/**
* Harvests documents or at least the document ids from a source system. Given an initial state, the
* client will load documents from a source system paging through the results until an end is
* reached. The client should return the resulting documents after each page.
*
* If paging is not possible and streaming is necessary, the harvester may use {@link
* Source#save(java.util.Collection)} to stream in the resulting documents rather than returning
* them in the {@link HarvestResult}.
*
*
For Box's purposes, there will only be one instance of the harvester per source and only one
* thread will call harvest at a time. So a harvester does not need to be thread safe. Also for
* performance or other reasons, state may be maintained for subsequent calls of {@link
* #harvest(HarvestContext)}. In order to pick back up where left off in the case of application
* redeployments, a "cursor" will be saved to a database and offered back to the client letting the
* client know where it left off.
*
*
The harvester can also be responsible for just gathering ids to be processed by the processor
* rather than documents to be saved. This is done by returning a list of unprocessed documents
* where only the id is set.
*
* @author Charles Draper
*/
public interface Harvester extends BoxConfigurable {
/**
* Returns the next set or page of documents from the source system. The client knows what the
* "next" set is by observing the cursor object inside the context and determining which documents
* to return next. The cursor is an object that is created by the client and returned as part of
* the {@link HarvestResult} after each set.
*
* @param context context informing the harvester what to do next
* @return a result containing resulting documents and other information for Box
*/
HarvestResult harvest(HarvestContext context);
}