![JAR search and dependency download from the Maven repository](/logo.png)
src.it.unimi.dsi.big.mg4j.index.cluster.package.html Maven / Gradle / Ivy
Show all versions of mg4j-big Show documentation
MG4J: Managing Gigabytes for Java
Index partitioning and clustering.
This package contains the classes that provide the infrastructure for index partitioning
and clustering. The tools the actually perform partitioning can be found in {@link it.unimi.dsi.big.mg4j.tool}.
An index cluster is a set of local indices that are viewed as a single index. In
a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalCluster lexical cluster} each local index has a disjoint set of terms, but the document pointers contained
in each local index refer to the same documents. In a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalCluster documental cluster}
each index contains postings referring to a disjoint subset of a collection.
Clustering indices requires mapping term number and document pointers back and forth between the global
index and local indices. This mapping is provided by {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalClusteringStrategy documental clustering strategies}
and {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalClusteringStrategy lexical clustering strategies}.
Clusters are often generated by partitioning an index (albeit, for instance,
{@link it.unimi.dsi.big.mg4j.tool.Scan} produces a cluster as output of the indexing process). In this case, a
{@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalPartitioningStrategy documental partitioning strategy}
or a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalPartitioningStrategy lexical partitioning strategy}
explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning
strategy must be suitably matched.