All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.index.cluster.package.html Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!


  
    <span class="http"><span class="hljs-attribute">MG4J</span>: Managing Gigabytes for Java</span>
  

  

    

Index partitioning and clustering.

This package contains the classes that provide the infrastructure for index partitioning and clustering. The tools the actually perform partitioning can be found in {@link it.unimi.dsi.big.mg4j.tool}.

An index cluster is a set of local indices that are viewed as a single index. In a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalCluster lexical cluster} each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalCluster documental cluster} each index contains postings referring to a disjoint subset of a collection.

Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalClusteringStrategy documental clustering strategies} and {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalClusteringStrategy lexical clustering strategies}.

Clusters are often generated by partitioning an index (albeit, for instance, {@link it.unimi.dsi.big.mg4j.tool.Scan} produces a cluster as output of the indexing process). In this case, a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalPartitioningStrategy documental partitioning strategy} or a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalPartitioningStrategy lexical partitioning strategy} explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning strategy must be suitably matched.





© 2015 - 2025 Weber Informatics LLC | Privacy Policy