All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.index.cluster.package.html Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!


  
    MG4J: Managing Gigabytes for Java
  

  

    

Index partitioning and clustering.

This package contains the classes that provide the infrastructure for index partitioning and clustering. The tools the actually perform partitioning can be found in {@link it.unimi.dsi.big.mg4j.tool}.

An index cluster is a set of local indices that are viewed as a single index. In a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalCluster lexical cluster} each local index has a disjoint set of terms, but the document pointers contained in each local index refer to the same documents. In a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalCluster documental cluster} each index contains postings referring to a disjoint subset of a collection.

Clustering indices requires mapping term number and document pointers back and forth between the global index and local indices. This mapping is provided by {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalClusteringStrategy documental clustering strategies} and {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalClusteringStrategy lexical clustering strategies}.

Clusters are often generated by partitioning an index (albeit, for instance, {@link it.unimi.dsi.big.mg4j.tool.Scan} produces a cluster as output of the indexing process). In this case, a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalPartitioningStrategy documental partitioning strategy} or a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalPartitioningStrategy lexical partitioning strategy} explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning strategy must be suitably matched.





© 2015 - 2025 Weber Informatics LLC | Privacy Policy