All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.document.tika.package.html Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!


This package contains classes that expose Tika  
parsers as MG4J {@linkplain it.unimi.dsi.big.mg4j.document.DocumentFactory factories}. 
Each type of Tika metadata is mapped, when possible, to an MG4J field.
However, when using an {@link it.unimi.dsi.big.mg4j.document.tika.AutoDetectDocumentFactory} or any other factory in which
metadata fields are user-definable or otherwise variable, it is impossible to
provide a static listing of all available fields, as they depend on the
actual factory used to parse the document. In this case, an instance of 
a {@link it.unimi.dsi.big.mg4j.document.tika.GreedyTikaField} is used to return some useful data to the caller
by (essentially) concatenating the string representations of all metadata fields.






© 2015 - 2025 Weber Informatics LLC | Privacy Policy