
src.it.unimi.dsi.big.mg4j.index.cluster.package.html Maven / Gradle / Ivy
Show all versions of mg4j-big Show documentation
<span class="http"><span class="hljs-attribute">MG4J</span>: Managing Gigabytes for Java</span>
Index partitioning and clustering.
This package contains the classes that provide the infrastructure for index partitioning
and clustering. The tools the actually perform partitioning can be found in {@link it.unimi.dsi.big.mg4j.tool}.
An index cluster is a set of local indices that are viewed as a single index. In
a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalCluster lexical cluster} each local index has a disjoint set of terms, but the document pointers contained
in each local index refer to the same documents. In a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalCluster documental cluster}
each index contains postings referring to a disjoint subset of a collection.
Clustering indices requires mapping term number and document pointers back and forth between the global
index and local indices. This mapping is provided by {@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalClusteringStrategy documental clustering strategies}
and {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalClusteringStrategy lexical clustering strategies}.
Clusters are often generated by partitioning an index (albeit, for instance,
{@link it.unimi.dsi.big.mg4j.tool.Scan} produces a cluster as output of the indexing process). In this case, a
{@linkplain it.unimi.dsi.big.mg4j.index.cluster.DocumentalPartitioningStrategy documental partitioning strategy}
or a {@linkplain it.unimi.dsi.big.mg4j.index.cluster.LexicalPartitioningStrategy lexical partitioning strategy}
explain how to divide and remap term numbers and document pointers. Of course, the clustering and partitioning
strategy must be suitably matched.