All Downloads are FREE. Search and download functionalities are using the official Maven repository.

Download SiteCrawler JAR file with all dependencies

Search JAR files by class name

SiteCrawler from group io.github.jasperroel (version 1.0.0)

This project provides a simple WebCrawler with retry-capabilities, functionality to distinguish between http/https sites. It biggest feature is that it allows for plugins (or CrawlerActions), which allows you to hook your scripts into the crawling process. It also allow for setting "blocked" URLs. Those URLs or patterns will not be crawled.

Group: io.github.jasperroel Artifact: SiteCrawler
Show documentation Show source 
 

0 downloads
Artifact SiteCrawler
Group io.github.jasperroel
Version 1.0.0
Last update 30. July 2018
Tags: allow plugins simple crawleractions biggest webcrawler setting scripts allows feature blocked crawling crawled between https sites functionality hook those with patterns urls your into provides process that project which http will distinguish capabilities retry this also
Organization Salesforce.com
URL https://github.com/forcedotcom/SiteCrawler
License The BSD 2-Clause License
Dependencies amount 3
Dependencies jcl-over-slf4j, htmlunit, commons-lang,
There are maybe transitive dependencies!



Page 1 from 1 (items total 1)


© 2015 - 2024 Weber Informatics LLC | Privacy Policy