All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.index.cluster.ChainedLexicalClusteringStrategy Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!
package it.unimi.dsi.big.mg4j.index.cluster;

/*		 
 * MG4J: Managing Gigabytes for Java (big)
 *
 * Copyright (C) 2006-2011 Sebastiano Vigna 
 *
 *  This library is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU Lesser General Public License as published by the Free
 *  Software Foundation; either version 3 of the License, or (at your option)
 *  any later version.
 *
 *  This library is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 *  for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program; if not, see .
 *
 */

import it.unimi.dsi.big.mg4j.index.BitStreamIndex;
import it.unimi.dsi.big.mg4j.index.DiskBasedIndex;
import it.unimi.dsi.big.mg4j.index.Index;
import it.unimi.dsi.util.BloomFilter;
import it.unimi.dsi.big.util.StringMap;

/** A lexical clustering strategy that uses a chain of responsability to choose the local index:
 * {@linkplain StringMap term maps} out of a given list are inquired
 * until one contains the given term.
 * 
 * 

If the index cluster has Bloom filters, they will be used to reduce useless accesses to * term maps. * *

The intended usage of this class is memory/disk lexical partitioning. Note that a serialised version * of this class is empty. It acts just like a placeholder, so that loaders now that they * must generate a new instance depending on the indices contained in the cluster. * * @author Sebastiano Vigna */ public class ChainedLexicalClusteringStrategy implements LexicalClusteringStrategy { static final long serialVersionUID = 0; /** The array of indices to inquiry. */ private transient final StringMap[] termMap; /** An array of optional Bloom filters to reduce term map access, or null. */ private transient final BloomFilter[] termFilter; /** Creates a new chained lexical clustering strategy using additional Bloom filters. * *

Note that the static type of the parameter index is * an array of {@link Index}, but the elements of the array must be * {@linkplain DiskBasedIndex disk-based indices}, or an exception will be thrown. * * @param index an array of disk-based indices, from which term maps will be extracted. * @param termFilter an array, parallel to index, of Bloom filter representing the terms contained in each local index. */ public ChainedLexicalClusteringStrategy( final Index[] index, final BloomFilter[] termFilter ) { this.termMap = new StringMap[ index.length ]; for( int i = index.length; i-- != 0; ) if ( ( termMap[ i ] = ((BitStreamIndex)index[ i ]).termMap ) == null ) throw new IllegalArgumentException( "Index " + index[ i ] + " has no term map" ); this.termFilter = termFilter; } /** Creates a new chained lexical clustering strategy. * *

Note that the static type of the parameter index is * an array of {@link Index}, but the elements of the array must be * {@linkplain DiskBasedIndex disk-based indices}, or an exception will be thrown. * * @param index an array of disk-based indices, from which term maps will be extracted. */ public ChainedLexicalClusteringStrategy( final Index[] index ) { this( index, null ); } public int numberOfLocalIndices() { return termMap.length; } public int localIndex( final CharSequence term ) { for( int i = 0; i < termMap.length; i++ ) if ( ( termFilter == null || termFilter[ i ].contains( term ) ) && termMap[ i ].getLong( term ) != -1 ) return i; return -1; } public long globalNumber( int localIndex, long localNumber ) { throw new UnsupportedOperationException(); } }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy