All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.biojava.bibliography.BibRefQuery Maven / Gradle / Ivy

There is a newer version: 1.9.7
Show newest version
// BibRefQuery.java
//
//    [email protected]
//    March 2001
//

/*
 *                    BioJava development code
 *
 * This code may be freely distributed and modified under the
 * terms of the GNU Lesser General Public Licence.  This should
 * be distributed with the code.  If you do not have a copy,
 * see:
 *
 *      http://www.gnu.org/copyleft/lesser.html
 *
 * Copyright for this code is held jointly by the individual
 * authors.  These should be listed in @author doc comments.
 *
 * For more information on the BioJava project and its aims,
 * or to join the biojava-l mailing list, visit the home page
 * at:
 *
 *      http://www.biojava.org/
 *
 */
package org.biojava.bibliography;

import java.io.InputStream;
import java.util.Enumeration;
import java.util.Hashtable;

/**
 * 

* The interface BibRefQuery is a fundamental part of the Bibliographic Query * Service. It allows searching for and retrieving citations from a bibliographic * repository. The result of the query methods is again of type BibRefQuery which * allows further to refine the query. When the caller is satisfied with the query * results, the retrieval methods can be used to get either a list of citations (of type * {@link BibRef}), or an XML document representing citations. *

* *

* Squeezing all query and retrieval methods into one interface allows to build very * flexible systems, both distributed (where the client and repository parts are * executed on different computers) and stand-alone (both parts are linked * together into one process). *

* *

*

* * * *
* For example, this picture shows a client linked together * with a repository implementation. These two parts communicate * entirely via BibRefQuery interface. Each of them can be replaced * without changing the other one. * * *
* In this example, a client uses BibRefQuery interface to communicate only * with a local implementation of a distributed architecture (a CORBA communication * protocol in this case). The repository implementation is similarly shielded by the * same interface from the communication protocol layer. * * *
* The last picture shows yet another example of a distributed * architecture showing the parts which must be changed when a * different communication protocol is used (the SOAP-HTTP in this * case). Note that both the client and repository implementation * remained untouched. * * *
*

* *

The implementation is advised to used the following constructor

* *

*

 *    public NameOfAnImplementation (String[] args, Hashtable props) {...}
 *
* where both args and props contain implementation * specific parameters and properties. However, some properties are * more probable to be used - the suggested names for them are defined * either in this interface or in the "sister" interface {@link BibRefSupport}. *

* *

* The use of this constructor makes easier to load dynamically different * implementations. *

* *

* The methods of the BibRefQuery interface can be divided into three groups. * The first group deals with connections to bibliographic repositories - here are * methods {@link #connect connect}, {@link #disconnect disconnect}, {@link #destroy destroy}, * {@link #getCollectionId getCollectionId}, and {@link #isReady isReady}. *

* *

* The second and the most interesting group contains the query methods. * As mentioned above, these methods (mostly) return an another query collection * which is again query-able. Here belong methods {@link #find find}, {@link #findByAuthor findByAuthor}, * {@link #findById findById}, {@link #query query}, {@link #getBibRefCount getBibRefCount}, and {@link #sort sort}. *

* *

* The last group has methods for retrieving citations from the resulting query collection. * The retrieval methods also allow to return citations not fully populated with all * available attribute data (for example, the long abstracts can be asked for only later). * Here belong methods {@link #getAllBibRefs getAllBibRefs}, {@link #getBibRefs getBibRefs}, * {@link #getAllIDs getAllIDs}, {@link #getAllBibRefsAsXML getAllBibRefsAsXML}, * {@link #getBibRefsAsXML getBibRefsAsXML}, and {@link #getBibRefAsXML getBibRefAsXML}. *

* * *

Simple and Qualified Attribute Names

*

* There are several places where method arguments represent attribute names: *

* *

*

    *
  • In query methods, such as {@link #find find}, a list of attributes that should be * searched. *
  • The query results are citations represented as {@link BibRef} instances * but not necessarily fully populated - they may contain only a subset of attributes, * the excluded attribute lists used in several methods. *
  • The results may be ordered by one or more attributes (method {@link #sort sort}). *
*

* *

* Therefore, this interface defines several rules how to specify attribute names whenever * they have to be expressed as strings. The existence of these rules will make the * implementations interoperable. But, of course, they can be ignored if the interoperability * is not an issue. *

* *

* The following rules define how to create stringified names for individual attributes. *

    *
  1. The best recommended practice is to find attribute names from a controlled * vocabulary - see details in {@link BibRefSupport} interface. *
  2. The stringified names of attributes of class {@link BibRef} are equal to the * member names of this class. For example, identifier, type, * title, authors. *
  3. The stringified names of attributes of sub-classes derived from class {@link BibRef}, * and of attributes of other classes, are also equal to the member names but additionally * they must be qualified by the resource type using two underscores ( __ ). * For example, Book__isbn, JournalArticle__from_journal, * Journal__name. *
    * The somewhat unusual double underscore is suggested here because in some * query languages (where the stringified attribute names can be used as * variables) is an underscore the only non-alphabetic character allowed for * variables. *
    *
  4. The qualification part of the stringified name (together with underscores) can * be omitted if there is no ambiguity. For example, if an implementation does not * use property name isbn anywhere else, the Book__isbn can * be replaced by simple isbn. *
    * Be aware, however, that dropping the qualifier may compromise extendibility * because a caller that expects a unique attribute name may break if another * citation type is added with the same attribute name. *
    *
  5. The stringified names of the attributes from {@link BibRef#properties dynamic properties} * are equal to their property names, applying the rule about qualification as defined above. * Thus, for example, an attribute registry_number hidden in member * {@link BibRef#properties} will be stringified simply as registry_number, * and an attribute location hidden in properties of a sub-class * representing books will be stringified as Book__location. *
  6. The stringified names of the attributes from {@link BibRef#properties dynamic properties} * for instances without their own sub-class must be qualified (as described above) by * the contents of their {@link BibRef#type}. For example, a citation can be of type * letter, but there is no sub-class Letter. Therefore, an attribute * type has value letter. This value is then used to create a qualified * stringified name letter__subject. *
  7. The stringified names should be considered case-insensitive. Thus, book__location * is the same as Book__location, and journalarticle__issue equals to * JournalArticle__issue. *
*

* *
*

Query Matching and Ordering Criteria

*

* Several methods dealing with queries and sorting use a list of criteria. * The criteria define how the matching or ordering should be done. *

* *

* Each criterion is fully defined by an instance of {@link BiblioCriterion}. * Such definitions can be obtained from a controlled vocabulary - see * {@link BibRefSupport#getSupportedCriteria()}. *

* *

* Because each criterion is uniquely identifiable by its name, the querying and * sorting methods use only lists of criteria names, not lists of full criteria * definitions. *

* *
*

Excluded and Only-included attributes

*

* Several methods use parameter with excluded attributes, or a * parameter with only-included attributes. * There are two different meanings and uses of such attributes lists. *

* *

* The first meaning is used by the query methods. They return a new query collection. * From the practical and performance reasons it may be sometimes useful to define * in advance that the citations representing the resulting query collection * do not need to contain all attributes. The excluded list of attribute * names defines what attributes are not needed - typical use is to exclude * abstracts which may be quite long. * This, using the excluded list in the query method means that the * resulting query collection will never have all attributes fully filled with data * (unless, of course, the implementation ignores the excluded list). *

* *

* The second meaning is for the retrieval methods. They return citations from * a current query collection and can decide that only some attributes in the returned * citations are filled with data (such parameter list is always named onlyAttrs). * It may again mean that less data will be transferred * but it is a post-act decision because the query collection has already all * data and only does not return them now, but the next retrieval method (on the * same collection) can retrieve them. *

* *

* The both uses may be applied in different scenarios, and their efficiency is * very dependent on the repository implementation. Sometimes the creation of a * query collection already includes heavy data manipulation - therefore, the * first usage may help with performance, But sometimes the resulting query * collection is more or less a virtual collection and the real data transfer * is applied only when the citations are being retrieved. In this case, the later * scenario may be more efficient. *

* * @author
Martin Senger * @author Matthew Pocock * @version $Id$ * @see BibRef * @see BibRefSupport * @see BiblioCriterion * @since 1.3 */ public interface BibRefQuery { /** *

* A property name specifying a list of excluded attribute names * (the type of the property value should be String[]). *

* *

* The list is used to define attributes which are not returned in the * resulting citations (see discussion on * excluded attributes). *

* * @see #find find * @see #query query */ static final String PROP_EXCLUDED_ATTRS = "excluded"; /** * A property name specifying a list of searching and ordering criteria * names (type of the property value should be String[]). See * discussion on criteria. * * @see #find find * @see #query query */ static final String PROP_CRITERIONS = "criterions"; /************************************************************************** *

* It returns an identification of the current query collection. *

* *

* At the beginning, the identification usually contain a bibliographic * repository name or its contents description. But later, usually after * {@link #connect} or after the first query, the identification may contain * information rich enough to be able to re-create the whole collection * (e.g. it can contain IDs of all records in the given collection). *

* *

* An implementation is not required to provide a persistent collection * identification. However, if it does provide, it should also be * able to accept the same identifier in the {@link #connect(byte[]) connect} * method, and to use it to re-create the same collection. *

* * @return an identification of the current collection (may be null) *************************************************************************/ byte[] getCollectionId(); /************************************************************************** *

* It creates a connection to a bibliographic repository, or/and it makes * all necessary initialization steps needed for further communication. *

* *

* However, there should be no need to call this method explicitly, * the other methods should do it automatically before they need something * from the repository. *

* * @throws BibRefException if the connection cannot be established *************************************************************************/ void connect() throws BibRefException; /************************************************************************** *

* It creates a connection to a bibliographic repository, or/and it makes * all necessary initialization steps needed for further communication, * and it makes the collection described by collectionId * the current collection. *

* * @see #connect connect without parameters * @param collectionId a (usually persistent) token allowing to re-create * a collection; the parameter is the same as an identifier returned * earlier by method {@link #getCollectionId} * @throws BibRefException if the connection cannot be established, or if the * collection with the given ID cannot be re-created *************************************************************************/ void connect (byte[] collectionId) throws BibRefException; /************************************************************************** * It checks if the repository is available. The semantic of * availabledepends on the implementation. * * @return true if it is ready *************************************************************************/ boolean isReady(); /************************************************************************** *

* It disconnects from the repository. *

* *

* The caller can use this method to announce that the current query * collection will not be needed soon. However, it may still be possible * to connect to it later again. *

* * @see #destroy destroy for more permanent action *************************************************************************/ void disconnect(); /************************************************************************* *

* It frees all resources related to this query collection. *

* *

* The caller explicitly announces no interest in the current * query collection at all. The existence of two separate * methods {@link #disconnect} and destroy allows more flexibility * for cases where an implementation deals with, for example, * temporary repositories. *

* * @see #disconnect disconnect for less permanent action * @throws BibRefException if the connection to the repository is broken *************************************************************************/ void destroy() throws BibRefException; /************************************************************************* *

* The easiest direct method for querying a repository. *

* *

* It is modeled on examples of web-based searches: A caller can specify * virtually anything in the list of keywords and the implementation tries * to search for these in as many attributes as possible and reasonable, * applying logical AND between them. However, a caller can also specifically * limit the search only to attributes specified in the searched list. *

* *

* Note that there is no real query language used by this method, * therefore, this method is not suitable for queries requiring * logical operators (others than AND). *

* *

* The query result can be influenced by the additional properties: *

    *
  • Property {@link #PROP_EXCLUDED_ATTRS} is of type String[] * and contains list of attributes names which should not be * included in the resulting query collection. See discussions on * excluded attributes and on * stringified attribute names, *
  • Property {@link #PROP_CRITERIONS} is also of type String[] * and contains list of criteria names. The caller specifies here * what criteria she wishes, and this method can change this property * and return here the criteria really used for the query. * See also discussion about criteria. *
*

* * @param keywords keyword or phrases that are being looked for * @param attrs attributes names that should be searched; if this list is * empty the implementation should search all reasonable * attributes * @param properties specify attributes excluded from the results and * requested criteria for the query * @return a new query (and query-able) collection * @throws BibRefException if query failed (which can have many reasons :-)) * (note that an empty result does not cause an exception) *************************************************************************/ BibRefQuery find (String[] keywords, String[] attrs, Hashtable properties) throws BibRefException; /************************************************************************* *

* This is a convenient method for a common query. *

* *

* The search is done only for attributes having non empty values in * parameter author. For example, a search for citations written * by authors with surname Doe can be specified by sending an * instance of BiblioPerson with surname filled with * Doe and with other attributes empty. Or, a search for * institution EBI can be specified by sending an instance of * BiblioOrganization with name containing EBI. *

* *

* The query result can be influenced by the additional properties: *

    *
  • Property {@link #PROP_EXCLUDED_ATTRS} is of type String[] * and contains list of attributes names which should not be * included in the resulting query collection. See discussions on * excluded attributes and on * stringified attribute names, *
  • Property {@link #PROP_CRITERIONS} is also of type String[] * and contains list of criteria names. The caller specifies here * what criteria she wishes, and this method can change this property * and return here the criteria really used for the query. * See also discussion about criteria. *
*

* * @see #find find * @see BiblioPerson * @see BiblioOrganisation * @see BiblioService * @param author contains one or more attributes that are being search for * @param properties specify attributes excluded from the results and * requested criteria for the query * @return a new query (and query-able) collection * @throws BibRefException if query failed (which can have many reasons :-)) * (note that an empty result does not cause an exception) *************************************************************************/ BibRefQuery findByAuthor (BiblioProvider author, Hashtable properties) throws BibRefException; /************************************************************************* *

* This is a convenient method returning just one citation. *

* *

* It queries the current collection in order to find and to retrieve * a citation with the given identifier. It depends on the implementation * what could be used as an identifier - see {@link BibRef#identifier}. *

* * @see #findById(String,String[]) findById with limited returned attributes * @param bibRefId an identifier of a citation that is being looked for * @return a found bibliographic reference (citation) * @throws BibRefException if such citation was not found (or something else * bad happened) *************************************************************************/ BibRef findById (String bibRefId) throws BibRefException; /************************************************************************* *

* This is a convenient method returning just one citation, perhaps with * a limited number of attributes. *

* *

* It queries the current collection in order to find and to retrieve * a citation with the given identifier. It depends on the implementation * what could be used as an identifier - see {@link BibRef#identifier}. *

* *

* The returned citation will contain at least attributes whose names are * specified by the parameter onlyAttrs (see discussion on * only-included attributes. *

* *

* It is meant to provide more lightweight citation. The * implementation may provide more attributes than specified in * onlyAttrs (e.g. it is always recommended to include an * attribute representing a unique identifier of the citation even * if it is not asked for). *

* *

* Note that one can ask only for attributes that are available in the * current collection. If the collection was already created * without some attributes (using property * {@link #PROP_EXCLUDED_ATTRS}, e.g in method {@link #find find}) one cannot * expect to get them even if they are asked for by the parameter * onlyAttrs. *

* * @see #findById(String) findById * @param bibRefId an identifier of a citation that is being looked for * @param onlyAttrs a list of attribute names; at least these attributes * will be included in the returned citation * @return a found bibliographic reference (citation) * @throws BibRefException if such citation was not found (or something else * bad happened) *************************************************************************/ BibRef findById (String bibRefId, String[] onlyAttrs) throws BibRefException; /************************************************************************* *

* It queries the current collection using a query language. *

* *

* Use this method when the simple {@link #find find} method is not sufficient. * For example, when more logical or relational operators are needed * to express the query, *

* *

* This specification does not propose any specific query language * to use (but may in the future). Roughly speaking, the query * method takes a query string and passes it to the repository * implementation, and if the implementation understands the query * the world is saved. *

* *

* Again, the query result can be influenced by the additional properties: *

    *
  • Property {@link #PROP_EXCLUDED_ATTRS} is of type String[] * and contains list of attributes names which should not be * included in the resulting query collection. See discussions on * excluded attributes and on * stringified attribute names, *
  • Property {@link #PROP_CRITERIONS} is also of type String[] * and contains list of criteria names. The caller specifies here * what criteria she wishes, and this method can change this property * and return here the criteria really used for the query. * See also discussion about criteria. *
*

* * @see #find find * @param query an expression in a query language * @param properties specify attributes excluded from the results and * requested criteria for the query * @return a new query (and query-able) collection * @throws BibRefException if query failed (which can have many reasons :-)) * (note that an empty result does not cause an exception) *************************************************************************/ BibRefQuery query (String query, Hashtable properties) throws BibRefException; /************************************************************************* *

* It returns the number of citations in the current collection. *

* * @return the size of this collection * @throws BibRefException if a connection with the repository is broken *************************************************************************/ int getBibRefCount() throws BibRefException; /************************************************************************* *

* It sorts the current collection and returns another collection which is * a sorted copy of the current collection. *

* *

* This is not strictly speaking a query method but it also returns * a query collection. *

* *

* The sorting result can be influenced by an additional property * {@link #PROP_CRITERIONS} (of type String[]) containing * a list of sorting criteria names. The caller specifies here * what criteria she wishes, and this method can change this property * and return here the criteria really used for sorting. *

* * @param orderedBy a list of attribute names that the collection should * be sorted by * @param properties FIXME: docs & params out of sync * @return a sorted collection * @throws BibRefException if sorting failed (which may also happen when * the collection is too large) *************************************************************************/ BibRefQuery sort (String[] orderedBy, Hashtable properties) throws BibRefException; /************************************************************************* *

* It returns all citations from the current collection as a * (possibly big) array. Obviously, the repository implementation * may limit the number of returned records. *

* *

* Some attributes may be missing (empty) if the property * {@link #PROP_EXCLUDED_ATTRS} was used for creating the current * collection. See discussion on * excluded attributes. *

* * @see #getAllBibRefs(String[]) getAllBibRefs with limited returned attributes * @return all citations from the current collection * @throws BibRefException if the collection is too large, or if the connection * to the repository is broken *************************************************************************/ BibRef[] getAllBibRefs() throws BibRefException; /************************************************************************* *

* It returns all citations from the current collection as a * (possibly big) array, perhaps with a limited number of attributes. *

* *

* The returned citations will contain at least attributes whose names are * specified by the parameter onlyAttrs. It is meant to provide * more lightweight citations. The implementation may provide more * attributes than specified in onlyAttrs (e.g. it may be always * good to include an attribute representing a unique identifier of a * citation even if it is not asked for). See discussion on * only-included attributes. *

* *

* Note that one can ask only for attributes that are available in the * current collection. If the collection was already created * without some attributes (using property * {@link #PROP_EXCLUDED_ATTRS}, e.g in method {@link #find find}) one * cannot expect to get them even if they are asked for by the parameter * onlyAttrs. *

* * @see #getAllBibRefs getAllBibRefs with all attributes * @see #getAllIDs getAllIDs * * @param onlyAttrs attributes to attempt to include * @return all citations from the current collection * @throws BibRefException if the collection is too large, or if the connection * to the repository is broken *************************************************************************/ BibRef[] getAllBibRefs (String[] onlyAttrs) throws BibRefException; /************************************************************************* *

* A convenient method returning just identifiers of all current citations. *

* * @return a list of all identifiers * @throws BibRefException if the collection is too large, or if the connection * to the repository is broken *************************************************************************/ String[] getAllIDs() throws BibRefException; /************************************************************************* *

* It returns an enumeration of all citations from the current collection. * The type of elements in the enumeration is {@link BibRef} (or of its * sub-classes). *

* *

* Some attributes may be missing (empty) if the property * {@link #PROP_EXCLUDED_ATTRS} was used for creating the current * collection. *

* * @see #getAllBibRefs getAllBibRefs * @return an iterator over all citations * @throws BibRefException if the connection to the repository is broken *************************************************************************/ Enumeration getBibRefs() throws BibRefException; /************************************************************************* *

* It returns an enumeration of all citations from the current collection, * perhaps with a limited number of attributes. * The type of elements in the enumeration is {@link BibRef} (or of its * sub-classes). *

* *

* The citations available through the enumeration will contain at least * attributes whose names are specified by the parameter onlyAttrs. * It is meant to provide more lightweight citations. The implementation * may provide more attributes than specified in onlyAttrs (e.g. * it may be always good to include an attribute representing a unique * identifier of a citation even if it is not asked for). *

* *

* Note that one can ask only for attributes that are available in the * current collection. If the collection was already created * without some attributes (using property * {@link #PROP_EXCLUDED_ATTRS}, e.g in method {@link #find find}) one cannot * expect to get them even if they are asked for by the parameter * onlyAttrs. *

* * @see #getAllBibRefs getAllBibRefs * @see #getBibRefs getBibRefs with all attributes * * @param onlyAttrs attributes to attempt to fetch * @return an iterator over all citations * @throws BibRefException if the connection to the repository is broken *************************************************************************/ Enumeration getBibRefs (String[] onlyAttrs) throws BibRefException; /************************************************************************* *

* It returns all citations from the current collection as an XML stream. * The contents of such XML stream is widely repository dependent. *

* *

* Some attributes may be missing (empty) if the property * {@link #PROP_EXCLUDED_ATTRS} was used for creating the current * collection. *

* * @see #getAllBibRefs getAllBibRefs * @return an XML data stream containing all citations from the current * collection * @throws BibRefException if the collection is too large, or if the connection * to the repository is broken *************************************************************************/ InputStream getAllBibRefsAsXML() throws BibRefException; /************************************************************************* *

* It returns an enumeration of all citations from the current collection. * The type of elements in the enumeration is String. * Each element represents one citation as an XML string. * The contents of such XML string is widely repository dependent. *

* *

* Some attributes may be missing (empty) if the property * {@link #PROP_EXCLUDED_ATTRS} was used for creating the current * collection. *

* * @see #getBibRefs getBibRefs * @see #getAllBibRefsAsXML getAllBibRefsAsXML * @return an iterator over all citations * @throws BibRefException if the connection to the repository is broken *************************************************************************/ Enumeration getBibRefsAsXML() throws BibRefException; /************************************************************************* *

* A convenient utility method converting a given citation to its * XML representation. It is useful, for example, in cases when a * program annotates citations on-the-fly and needs them in the * same XML format. *

* *

* The XML format depends on the repository where the citation comes from. *

* * @param bibRef a citation being converted into an XML format * @return an XML representation of bibRef * @throws BibRefException if the implementation needs it :-) *************************************************************************/ String getBibRefAsXML (BibRef bibRef) throws BibRefException; }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy