Download pdf-extractor JAR 2.0.1 with all dependencies
This is an optimized version of Apache PDFBox. It allows
to extract the rough structure of a document (pages, blocks of text and
paragraphs as well as formatting information) and was made with the
intent to optimize text extraction results for scientific papers.
The output can easily be transformed to plaintext (toString) or to
an XML format (toXML).
Files of the artifact pdf-extractor version 2.0.1 from the group de.cit-ec.scie.
Artifact pdf-extractor
Group de.cit-ec.scie
Version 2.0.1
Last update 10. December 2014
Tags: pdfbox made easily apache transformed well tostring version document results optimized toxml allows formatting information structure text with plaintext pages paragraphs papers format blocks extraction rough intent extract optimize scientific output this
Organization not specified
URL http://openresearch.cit-ec.de/projects/scie/
License The GNU Affero General Public License, Version 3
Dependencies amount 1
Dependencies pdfbox,
There are maybe transitive dependencies!
Group de.cit-ec.scie
Version 2.0.1
Last update 10. December 2014
Tags: pdfbox made easily apache transformed well tostring version document results optimized toxml allows formatting information structure text with plaintext pages paragraphs papers format blocks extraction rough intent extract optimize scientific output this
Organization not specified
URL http://openresearch.cit-ec.de/projects/scie/
License The GNU Affero General Public License, Version 3
Dependencies amount 1
Dependencies pdfbox,
There are maybe transitive dependencies!
The newest version!
Show all versions of pdf-extractor Show documentation
Please rate this JAR file. Is it a good library?
11 downloads
Source code of pdf-extractor version 2.0.1
META-INF
de.citec.scie.pdf
de.citec.scie.pdf.structure
© 2015 - 2024 Weber Informatics LLC | Privacy Policy