src.it.unimi.dsi.big.mg4j.index.IndexReader Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of mg4j-big Show documentation
MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.
The newest version!
package it.unimi.dsi.big.mg4j.index;

/*		 
 * MG4J: Managing Gigabytes for Java (big)
 *
 * Copyright (C) 2005-2011 Paolo Boldi and Sebastiano Vigna 
 *
 *  This library is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU Lesser General Public License as published by the Free
 *  Software Foundation; either version 3 of the License, or (at your option)
 *  any later version.
 *
 *  This library is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 *  for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program; if not, see .
 *
 */

import it.unimi.dsi.io.SafelyCloseable;
import it.unimi.dsi.big.util.StringMap;

import java.io.IOException;

/** Provides access to an inverted index.
*
* An {@link it.unimi.dsi.big.mg4j.index.Index} contains global read-only metadata. To get actual data
* from an index, you need to get an index reader via a call to {@link Index#getReader()}. Once
* you have an index reader, you can ask for the {@linkplain #documents(CharSequence) documents matching a term}.
* 
* 
Alternatively, you can perform a read-once scan of the index calling {@link #nextIterator()},
* which will return in order the {@linkplain IndexIterator index iterators} of all terms of the underlying index.
* More generally, {@link #nextIterator()} returns an iterator positioned at the start of the inverted
* list of the term after the current one. When called just after the reader creation, it returns an
* index iterator for the first term.
* 
* 
Warning: An index reader is exactly what it looks like—a reader. It
* cannot be used by many threads at the same time, and all its access methods are exclusive: if you
* obtain a {@linkplain #documents(long) document iterator}, the previous one is no longer valid. However,
* you can generate many readers, and use them concurrently.
* 
* 
Warning: Invoking the {@link it.unimi.dsi.big.mg4j.search.DocumentIterator#dispose()} method
* on iterators returned by an instance of this class will invoke {@link #close()} on the instance, thus
* making the instance no longer accessible. This behaviour is necessary to handle cases in which a
* reader is created on-the-fly just to create an iterator.
*
* @author Paolo Boldi 
* @author Sebastiano Vigna 
* @since 1.0
*/

public interface IndexReader extends SafelyCloseable {

	/** Returns a document iterator over the documents containing a term.
	 * 
	 * 
Note that the index iterator returned by this method will
	 * return null on a call to {@link IndexIterator#term() term()}.
	 * 
	 * 	
Note that it is always possible
	 * to call this method with argument 0, even if the underlying index
	 * does not provide random access.
	 * 
	 * @param termNumber the number of a term.
	 * @throws UnsupportedOperationException if this index reader is not accessible by term
	 * number.
	 */
	public IndexIterator documents( long termNumber ) throws IOException;

	/** Returns an index iterator over the documents containing a term; the term is
	 *  given explicitly.
	 * 
	 * 
Unless the {@linkplain Index#termProcessor term processor} of
	 * the associated index is null, words coming from a query will
	 * have to be processed before being used with this method.
	 * 
	 * 
Note that the index iterator returned by this method will
	 * return term on a call to {@link IndexIterator#term() term()}.
	 *
	 * @param term a term (the term will be downcased if the index is case insensitive).
	 * @throws UnsupportedOperationException if the {@linkplain StringMap term map} is not available for the underlying index.
	 */
	public IndexIterator documents( CharSequence term ) throws IOException;
	
	/** Returns an {@link IndexIterator} on the term after the current one (optional operation).
	 * 
	 * Note that after creation there is no current term. Thus, the first call to this
	 * method will return an {@link IndexIterator} on the first term. As a consequence, repeated
	 * calls to this method provide a way to scan sequentially an index.
	 * 
	 * @return the index iterator of the next term, or null if there are no more terms
	 * after the current one.
	 */
	
	public IndexIterator nextIterator() throws IOException;
}