All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.io.WordReader Maven / Gradle / Ivy

Go to download

The DSI utilities are a mishmash of classes accumulated during the last twenty years in projects developed at the DSI (Dipartimento di Scienze dell'Informazione, i.e., Information Sciences Department), now DI (Dipartimento di Informatica, i.e., Informatics Department), of the Universita` degli Studi di Milano.

There is a newer version: 2.7.3
Show newest version
package it.unimi.dsi.io;

/*
 * DSI utilities
 *
 * Copyright (C) 2005-2018 Paolo Boldi and Sebastiano Vigna
 *
 *  This library is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU Lesser General Public License as published by the Free
 *  Software Foundation; either version 3 of the License, or (at your option)
 *  any later version.
 *
 *  This library is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 *  for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program; if not, see .
 *
 */

import it.unimi.dsi.lang.MutableString;

import java.io.IOException;
import java.io.Reader;
import java.io.Serializable;

/** An interface providing methods to break the input from a reader into words.
 *
 * 

The intended implementations of this interface should decorate * a given reader (see, for instance, {@link it.unimi.dsi.io.FastBufferedReader}). * The reader can be changed at any time using {@link #setReader(Reader)}. * *

This interface is heavily oriented towards reusability and * streaming. It is conceived so that at most one method call has * to be performed per word, rather than per character, * and that implementations may completely avoid object creation by * {@linkplain #setReader(Reader) setting explicitly the underlying reader}. * *

The standard implementation ({@link it.unimi.dsi.io.FastBufferedReader}) breaks * words in the trivial way. More complex implementations (e.g., for languages requiring * segmentation) can subclass {@link it.unimi.dsi.io.FastBufferedReader} or provide their * own implementation. */ public interface WordReader extends Serializable { /** Extracts the next word and non-word. * *

If this method returns true, a new non-empty word, and possibly * a new non-word, have been extracted. It is acceptable * that the first call to this method after creation * or after a call to {@link #setReader(Reader)} returns an empty * word. In other words both word and nonWord are maximal. * * @param word the next word returned by the underlying reader. * @param nonWord the nonword following the next word returned by the underlying reader. * @return true if a new word was processed, false otherwise (in which * case both word and nonWord are unchanged). */ public abstract boolean next(MutableString word, MutableString nonWord) throws IOException; /** Resets the internal state of this word reader, which will start again reading from the given reader. * * @param reader the new reader providing characters. * @return this word reader. */ public abstract WordReader setReader(Reader reader); /** Returns a copy of this word reader. * *

This method must return a word reader with a behaviour that * matches exactly that of this word reader. * * @return a copy of this word reader. */ public abstract WordReader copy(); }





© 2015 - 2024 Weber Informatics LLC | Privacy Policy