All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.big.mg4j.search.IntervalIterator Maven / Gradle / Ivy

Go to download

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. The big version is a fork of the original MG4J that can handle more than 2^31 terms and documents.

The newest version!
package it.unimi.dsi.big.mg4j.search;

/*		 
 * MG4J: Managing Gigabytes for Java (big)
 *
 * Copyright (C) 2003-2011 Paolo Boldi and Sebastiano Vigna 
 *
 *  This library is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU Lesser General Public License as published by the Free
 *  Software Foundation; either version 3 of the License, or (at your option)
 *  any later version.
 *
 *  This library is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 *  for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program; if not, see .
 *
 */

import it.unimi.dsi.fastutil.longs.LongSet;
import it.unimi.dsi.util.Interval;

import java.io.IOException;
import java.util.Iterator;
                                                                                                                                                    
/** An iterator over {@linkplain Interval intervals}.
 *  Apart for the usual methods of a (type-specific) iterator, it
 *  has a special (optional) {@link #reset()} method that allows one to reset
 *  the iterator: the exact meaning of this operation is decided by the
 *  implementing classes. Typically, after a {@link #reset()}, one can
 *  iterate over a new sequence.
 *  
 *  

This interface also specifies a method {@link #extent()} returning * a positive integer that is supposed to approximate the minimum possible * length of an interval returned by this iterator. This method returns -1 * if this extent cannot be computed. */ public interface IntervalIterator extends Iterator { /** Resets the internal state of this iterator for a new document. * *

To reduce object creation, interval iterators are usually created in a lazy * fashion by document iterator when they are needed. However, this implies that * every time the document iterator is moved, some internal state of the interval iterator must be reset * (e.g., because on the new document some of the component interval iterators are now * {@link IntervalIterators#TRUE}). */ public void reset() throws IOException; /** Returns an approximation of a lower bound for the length of an interval * returned by this iterator. * * @return an approximation of a lower bound for the length of an interval. */ public int extent(); /** Returns the next interval provided by this interval iterator, or null if no more intervals are available. * *

This method implements fully lazy iteration over intervals. Fully lazy iteration * does not provide an hasNext() method—you have to actually ask for the next * element and check the return value. Fully lazy iteration is much lighter on method calls (half) and * in most (if not all) MG4J classes leads to a much simpler logic. Moreover, {@link #nextInterval()} * can be specified as throwing an {@link IOException}, which avoids the pernicious proliferation * of try/catch blocks in very short, low-level methods (it was having a detectable impact on performance). * * @return the next interval, or null if no more intervals are available. */ public Interval nextInterval() throws IOException; /** Provides the set of terms that span the current interval. * *

For each interval returned by MG4J, there is a set of terms that caused the interval to be returned. * The terms appear inside the interval, and certainly at its extremes. * *

Note that the results of this method must be taken with a grain of salt: there might be different sets of terms * causing the current interval, and only one will be returned. * * @param terms a set of integers that will be filled with the terms spanning the current interval. */ public void intervalTerms( LongSet terms ); }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy