All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.index.IndexReader Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!
package it.unimi.dsi.big.mg4j.index;

/*		 
 * MG4J: Managing Gigabytes for Java (big)
 *
 * Copyright (C) 2005-2011 Paolo Boldi and Sebastiano Vigna 
 *
 *  This library is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU Lesser General Public License as published by the Free
 *  Software Foundation; either version 3 of the License, or (at your option)
 *  any later version.
 *
 *  This library is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 *  for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program; if not, see .
 *
 */

import it.unimi.dsi.io.SafelyCloseable;
import it.unimi.dsi.big.util.StringMap;

import java.io.IOException;

/** Provides access to an inverted index.
*
* 

An {@link it.unimi.dsi.big.mg4j.index.Index} contains global read-only metadata. To get actual data * from an index, you need to get an index reader via a call to {@link Index#getReader()}. Once * you have an index reader, you can ask for the {@linkplain #documents(CharSequence) documents matching a term}. * *

Alternatively, you can perform a read-once scan of the index calling {@link #nextIterator()}, * which will return in order the {@linkplain IndexIterator index iterators} of all terms of the underlying index. * More generally, {@link #nextIterator()} returns an iterator positioned at the start of the inverted * list of the term after the current one. When called just after the reader creation, it returns an * index iterator for the first term. * *

Warning: An index reader is exactly what it looks like—a reader. It * cannot be used by many threads at the same time, and all its access methods are exclusive: if you * obtain a {@linkplain #documents(long) document iterator}, the previous one is no longer valid. However, * you can generate many readers, and use them concurrently. * *

Warning: Invoking the {@link it.unimi.dsi.big.mg4j.search.DocumentIterator#dispose()} method * on iterators returned by an instance of this class will invoke {@link #close()} on the instance, thus * making the instance no longer accessible. This behaviour is necessary to handle cases in which a * reader is created on-the-fly just to create an iterator. * * @author Paolo Boldi * @author Sebastiano Vigna * @since 1.0 */ public interface IndexReader extends SafelyCloseable { /** Returns a document iterator over the documents containing a term. * *

Note that the index iterator returned by this method will * return null on a call to {@link IndexIterator#term() term()}. * *

Note that it is always possible * to call this method with argument 0, even if the underlying index * does not provide random access. * * @param termNumber the number of a term. * @throws UnsupportedOperationException if this index reader is not accessible by term * number. */ public IndexIterator documents( long termNumber ) throws IOException; /** Returns an index iterator over the documents containing a term; the term is * given explicitly. * *

Unless the {@linkplain Index#termProcessor term processor} of * the associated index is null, words coming from a query will * have to be processed before being used with this method. * *

Note that the index iterator returned by this method will * return term on a call to {@link IndexIterator#term() term()}. * * @param term a term (the term will be downcased if the index is case insensitive). * @throws UnsupportedOperationException if the {@linkplain StringMap term map} is not available for the underlying index. */ public IndexIterator documents( CharSequence term ) throws IOException; /** Returns an {@link IndexIterator} on the term after the current one (optional operation). * *

Note that after creation there is no current term. Thus, the first call to this * method will return an {@link IndexIterator} on the first term. As a consequence, repeated * calls to this method provide a way to scan sequentially an index. * * @return the index iterator of the next term, or null if there are no more terms * after the current one. */ public IndexIterator nextIterator() throws IOException; }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy