All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.archive.modules.BeanShellProcessor_en.utf8 Maven / Gradle / Ivy

Go to download

This project contains some of the configurable modules used within the Heritrix application to crawl the web. The modules in this project can be used in applications other than Heritrix, however.

There is a newer version: 3.6.0
Show newest version
description:
BeanShellProcessor. Runs the BeanShell script source (supplied directly or via 
a file path) against the current URI. Source should define a script method 
'process(curi)' which will be passed the current CrawlURI.  The script may also 
access this BeanShellProcessor via the 'self' variable and the CrawlController 
via the 'controller' variable.


isolate-threads-description:
Whether each ToeThread should get its own independent script context, or 
they should share synchronized access to one context. Default is true, 
meaning each threads gets its own isolated context. 


script-file-description:
BeanShell script file.


manager-description:
The SheetManager used to configure this crawl.  Can be used to look up any 
other module based on its settings path.  The value you specify here will be
made available to the BeanShell script as the variable "manager".




© 2015 - 2025 Weber Informatics LLC | Privacy Policy