All Downloads are FREE. Search and download functionalities are using the official Maven repository.

src.it.unimi.dsi.parser.callback.Callback Maven / Gradle / Ivy

Go to download

The DSI utilities are a mishmash of classes accumulated during the last twenty years in projects developed at the DSI (Dipartimento di Scienze dell'Informazione, i.e., Information Sciences Department), now DI (Dipartimento di Informatica, i.e., Informatics Department), of the Universita` degli Studi di Milano.

There is a newer version: 2.7.3
Show newest version
package it.unimi.dsi.parser.callback;

/*
 * DSI utilities
 *
 * Copyright (C) 2005-2020 Sebastiano Vigna
 *
 *  This library is free software; you can redistribute it and/or modify it
 *  under the terms of the GNU Lesser General Public License as published by the Free
 *  Software Foundation; either version 3 of the License, or (at your option)
 *  any later version.
 *
 *  This library is distributed in the hope that it will be useful, but
 *  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 *  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
 *  for more details.
 *
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program; if not, see .
 *
 */

import it.unimi.dsi.lang.MutableString;
import it.unimi.dsi.parser.Attribute;
import it.unimi.dsi.parser.BulletParser;
import it.unimi.dsi.parser.Element;

import java.util.Map;

/** A callback for the {@linkplain it.unimi.dsi.parser.BulletParser bullet parser}.
 *
 * 

This interface is very loosely inspired to the SAX2 interface. However, it * strives to be simple, and to be StringFree™. * *

By contract, all implementations of this interface are bound to be reusable: * by calling {@link #startDocument()}, a callback can be used again. * It must be safe to call {@link #startDocument()} any number of times. * */ public interface Callback { /** A singleton empty callback array. */ Callback[] EMPTY_CALLBACK_ARRAY = new Callback[0]; /** Configure the parser for usage with this callback. * *

When a callback is registered with a parser, it needs to set up * the parser so that all data required by the callback is actually parsed. * The configuration must be a monotone process—you * can only set properties and add attribute types to * be parsed. */ void configure(BulletParser parser); /** Receive notification of the beginning of the document. * *

The callback must use this method to reset its internal state so * that it can be resued. It must be safe to invoke this method * several times. */ void startDocument(); /** Receive notification of the start of an element. * *

For simple elements, this is the only notification that the * callback will ever receive. * * @param element the element whose opening tag was found. * @param attrMap a map from {@link it.unimi.dsi.parser.Attribute}s to {@link MutableString}s. * @return true to keep the parser parsing, false to stop it. */ boolean startElement(Element element, Map attrMap); /** Receive notification of the end of an element. * * Warning: unless specific decorators are used, in * general a callback will just receive notifications for elements * whose closing tag appears explicitly in the document. * *

This method will never be called for element without closing tags, * even if such a tag is found. * * @param element the element whose closing tag was found. * @return true to keep the parser parsing, false to stop it. */ boolean endElement(Element element); /** Receive notification of character data inside an element. * *

You must not write into text, as it could be passed * around to many callbacks. * *

flowBroken will be true iff * the flow was broken before text. This feature makes it possible * to extract quickly the text in a document without looking at the elements. * * @param text an array containing the character data. * @param offset the start position in the array. * @param length the number of characters to read from the array. * @param flowBroken whether the flow is broken at the start of text. * @return true to keep the parser parsing, false to stop it. */ boolean characters(char[] text, int offset, int length, boolean flowBroken); /** Receive notification of the content of a CDATA section. * *

CDATA sections in an HTML document are the result of meeting * a STYLE or SCRIPT element. In that case, the element * will be passed as first argument. * *

You must not write into text, as it could be passed * around to many callbacks. * * @param element the element enclosing the CDATA section, or {@code null} if the * CDATA section was created with explicit markup. * @param text an array containing the character data. * @param offset the start position in the array. * @param length the number of characters to read from the array. * @return true to keep the parser parsing, false to stop it. */ boolean cdata(Element element, char[] text, int offset, int length); /** Receive notification of the end of the document. */ void endDocument(); }





© 2015 - 2024 Weber Informatics LLC | Privacy Policy