![JAR search and dependency download from the Maven repository](/logo.png)
src.it.unimi.dsi.big.mg4j.document.tika.package.html Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of mg4j-big Show documentation
Show all versions of mg4j-big Show documentation
MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.
The newest version!
This package contains classes that expose Tika
parsers as MG4J {@linkplain it.unimi.dsi.big.mg4j.document.DocumentFactory factories}.
Each type of Tika metadata is mapped, when possible, to an MG4J field.
However, when using an {@link it.unimi.dsi.big.mg4j.document.tika.AutoDetectDocumentFactory} or any other factory in which
metadata fields are user-definable or otherwise variable, it is impossible to
provide a static listing of all available fields, as they depend on the
actual factory used to parse the document. In this case, an instance of
a {@link it.unimi.dsi.big.mg4j.document.tika.GreedyTikaField} is used to return some useful data to the caller
by (essentially) concatenating the string representations of all metadata fields.
© 2015 - 2025 Weber Informatics LLC | Privacy Policy