All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.overview.html Maven / Gradle / Ivy

Go to download

Archive4J is a suite of tools to store compactly term/count information of a document collection.

There is a newer version: 1.3.3
Show newest version



  
    Archive4J
  
  

    

Archive4J is a free archive engine for large document collections written in Java. By “archive engine” we mean a set of algorithmic tools and implementations that make it possible to build a direct index of a document collection. In particular, for each document we want to be able to recover some basic data such as the length of the document in words, the list of distinct terms appearing in the document, and the number of occurrences of each term in the document (the count). We strive for a very high compression rate, and for very fast random access. To obtain this result, Archive4J combines techniques typical of search engines with succinct data structures.

Package Dependencies

Archive4J uses the DSI utilities, MG4J, and three packages providing high-performance containers and algorithms, that is, fastutil 5 or greater, the COLT distribution, and Sux4J. Command-line parsing and support requires JSAP. Archive4J uses also a number of useful libraries from the Jakarta commons project, including collections, lang, configuration and io. All logging is performed using log4j.





© 2015 - 2025 Weber Informatics LLC | Privacy Policy