All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.archive.crawler.extras.adaptive.AdaptiveRevisitFrontier_en.utf8 Maven / Gradle / Ivy

The newest version!
description:
AdaptiveRevisitFrontier. EXPERIMENTAL Frontier that will repeatedly visit all 
encountered URIs. Wait time between visits is configurable and is determined 
by separate Processor(s). See WaitEvaluators. See documentation for ARFrontier 
limitations.


bdb-description:
The bdb module to use for the frontier.


controller-description:
The crawl controller.


seeds-description:
The seeds module used to prepare the seeds for crawling.


uri-uniq-filter-description:
The UriUniqFilter implementation used to determine if a URI has already been
fetched.


delay-factor-description:
How many multiples of last fetch elapsed time to wait before recontacting 
same server 


force-queue-assignment-description:
Queue assignment to force on CrawlURIs. Intended to be used 
via overrides 


host-valence-description:
Maximum simultaneous requests in process to a host (queue) 


max-delay-ms-description:
Never wait more than this long, regardless of multiple 


max-retries-description:
Maximum times to emit a CrawlURI without final disposition 


min-delay-ms-description:
Always wait this long after one completion before recontacting 
same server, regardless of multiple 


preference-embed-hops-description:
Number of hops of embeds (ERX) to bump to front of host queue 


queue-ignore-www-description:
Should the queue assignment ignore www in hostnames, effectively 
stripping them away. 


retry-delay-description:
For retryable problems, seconds to wait before a retry 


use-uri-uniq-filter-description:
Should the Frontier use a separate 'already included' datastructure 
or rely on the queues'. 


dir-description:
The directory used by the frontier.


logger-module-description:
The logger module used during the crawl.


server-cache-description:
The server cache used during the crawl.


uri-canonicalization-rules-description:
The canonicalization rules used during the crawl.




© 2015 - 2024 Weber Informatics LLC | Privacy Policy