Download all versions of boilerpipe JAR files with all dependencies
boilerpipe from group de.l3s.boilerpipe (version 1.1.0)
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.
Artifact boilerpipe
Group de.l3s.boilerpipe
Version 1.1.0
Last update 03. November 2010
Tags: using individual clutter settings extracting apache used city level search page features 2010 algorithms required conference milliseconds presented strategies information text templates shallow global international under extraction third content input detection example fast some tter extending mining christian textual wsdm remove common specific paper kohlsch data easily boilerpipe released document accurate written usually york license surplus site just detect news needs extended provides problem already tasks concepts main around article quite library java boilerplate also based very
Organization not specified
URL http://code.google.com/p/boilerpipe/
License Apache License 2.0
Dependencies amount 0
Dependencies No dependencies
There are maybe transitive dependencies!
Group de.l3s.boilerpipe
Version 1.1.0
Last update 03. November 2010
Tags: using individual clutter settings extracting apache used city level search page features 2010 algorithms required conference milliseconds presented strategies information text templates shallow global international under extraction third content input detection example fast some tter extending mining christian textual wsdm remove common specific paper kohlsch data easily boilerpipe released document accurate written usually york license surplus site just detect news needs extended provides problem already tasks concepts main around article quite library java boilerplate also based very
Organization not specified
URL http://code.google.com/p/boilerpipe/
License Apache License 2.0
Dependencies amount 0
Dependencies No dependencies
There are maybe transitive dependencies!
boilerpipe from group de.l3s.boilerpipe (version 1.0.4)
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.
10 downloads
Artifact boilerpipe
Group de.l3s.boilerpipe
Version 1.0.4
Last update 20. May 2010
Organization not specified
URL http://code.google.com/p/boilerpipe/
License Apache License 2.0
Dependencies amount 0
Dependencies No dependencies
There are maybe transitive dependencies!
Group de.l3s.boilerpipe
Version 1.0.4
Last update 20. May 2010
Organization not specified
URL http://code.google.com/p/boilerpipe/
License Apache License 2.0
Dependencies amount 0
Dependencies No dependencies
There are maybe transitive dependencies!
Page 1 from 1 (items total 2)