All Downloads are FREE. Search and download functionalities are using the official Maven repository.

oracle.kv.avro.AvroCatalog Maven / Gradle / Ivy

/*-
 * Copyright (C) 2011, 2018 Oracle and/or its affiliates. All rights reserved.
 *
 * This file was distributed by Oracle as part of a version of Oracle NoSQL
 * Database made available at:
 *
 * http://www.oracle.com/technetwork/database/database-technologies/nosqldb/downloads/index.html
 *
 * Please see the LICENSE file included in the top-level directory of the
 * appropriate version of Oracle NoSQL Database for a copy of the license and
 * additional information.
 */

package oracle.kv.avro;

import java.util.Map;

import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.specific.SpecificRecord;

import oracle.kv.Consistency;
import oracle.kv.KVStore;
import oracle.kv.KVStoreConfig;
import oracle.kv.Value;

/**
 * A catalog of Avro schemas and bindings for a store.
 * 

* Manages schemas and provides {@link AvroBinding}s for use with the Avro * data format. The bindings are used along with {@link KVStore} APIs for * storing and retrieving key-value pairs. The bindings are used to serialize * Avro values before writing them, and deserialize Avro values after reading * them. An AvroCatalog is obtained by calling {@link KVStore#getAvroCatalog}. *

* WARNING: We strongly recommend using an {@link AvroBinding}. NoSQL * Database will leverage Avro in the future to provide additional features and * capabilities. *

* WARNING: To take advantage of the Avro data format, the bindings in * this class must be used. The {@link Value} byte array is constructed by the * binding to include an internal reference to the schema used for * serialization. The {@link Value} byte array may not be manipulated directly * by the application. * *

Avro Schemas

* * When the Avro data format is used, each stored value must be associated with * an Avro schema. The Avro schema describes the fields allowed in the value, * along with their data types. An Avro schema is created by the application * developer, added to the store using the NoSQL Database administration * interface, and used in the client API via the {@code AvroCatalog} class. *

* An Avro schema is created in JSON format, typically using a text editor and * initially saved in a text file. Of course, to create an Avro schema the * developer must understand the Avro schema syntax. For more information see * Avro Schemas in the Getting Started Guide and the Avro schema * specification. *

* Once created and saved in a text file, the schema is added to the store * using the {@code ddl add-schema} administrative command, using the text file * as input; see Adding Schema in the Getting Started Guide. Until a schema is * added, it may not be used in the client API to store values. The use of the * schema in the client API is described further below. *

* Note that the use of Avro schemas allows serialized values to be stored in a * very space-efficient binary format. Each value is stored without any * metadata other than a small internal schema identifier, between 1 and 4 * bytes in size. One such reference is stored per key-value pair. In this * way, the serialized Avro data format is always associated with the schema * used to serialize it, with minimal overhead. This association is made * transparently to the application, and the internal schema identifier is * managed by the bindings supplied by the {@code AvroCatalog} class. The * application never sees or uses the internal identifier directly. *

* Two example schemas are shown below along with the administrative commands * for adding them to the store. These schemas are used further below in other * examples. *

* The schemas might be stored in a simple text file, {@code schema1.txt}: *

 *  {
 *  "type": "record",
 *  "name": "MemberInfo",
 *  "namespace": "avro",
 *  "fields": [
 *      {"name": "name", "type": {
 *          "type": "record",
 *          "name": "FullName",
 *          "fields": [
 *              {"name": "first", "type": "string", "default": ""},
 *              {"name": "last", "type": "string", "default": ""}
 *          ]
 *      }, "default": {}},
 *      {"name": "age", "type": "int", "default": 0}
 *   ]
 * }
* * The administrative command for adding the above schemas is: *
 *  > ddl add-schema -file schema1.txt
* *

Schema Evolution

* * A schema may be changed, even after data values are stored using that * schema, using the {@code ddl add-schema} administrative command with the * {@code -evolve} option; see Changing Schema in the Getting Started Guide. * The modified schema is saved in a text file, which is passed to this command * as input. For example, fields may be added, removed or renamed. *

* For example, if a middle name property is added in the future to the * schema, it might be stored in {@code schema2.txt}. Note that a new field * must be given a default value. *

 *  {
 *  "type": "record",
 *  "name": "MemberInfo",
 *  "namespace": "avro",
 *  "fields": [
 *      {"name": "name", "type": {
 *          "type": "record",
 *          "name": "FullName",
 *          "fields": [
 *              {"name": "first", "type": "string", "default": ""},
 *              { "name": "middle", "type": "string", "default": "" },
 *              {"name": "last", "type": "string", "default": ""}
 *          ]
 *      }, "default": {}},
 *      {"name": "age", "type": "int", "default": 0}
 *   ]
 * }
* * The administrative command for adding the new version of the schema is: *
 *  > ddl add-schema -file schema2.txt -evolve
*

* When a schema is changed, multiple versions of the schema will exist and be * maintained by the store. The version of the schema used to serialize a * value, before writing it to the store, is called the writer * schema. The writer schema is specified by the application when * creating a binding. It is associated with the value when calling the * binding's {@link AvroBinding#toValue} method to serialize the data. As * mentioned above, the writer schema is associated internally with every * stored value. *

* The reader schema is used to deserialize a value after reading it * from the store. Like the writer schema, the reader schema is specified by * the client application when creating a binding. It is used to deserialize * the data when calling the binding's {@link AvroBinding#toObject} method, * after reading a value from the store. *

* When the reader and writer schemas are different, schema evolution is * applied during deserialization. Schema evolution is applied by transforming * the data during deserialization, so that data stored according to the writer * schema is transformed to conform to the reader schema. When the reader and * writer schemas are the same, no data transformation is necessary. Also note * that no data transformation takes place during serialization; i.e., data is * always written according to the writer schema. *

* Reader and writer schemas can be different when a client is changed to use a * new version of the schema, and then reads data that was written using the * old version. Schema versions can also be different when two clients are * operating concurrently using two different versions of a schema. In a * distributed system such as NoSQL Database, it is normally not possible or * desirable to upgrade all clients simultaneously, since this would require * downtime. Therefore, for some period of time there will be a mix of clients * operating concurrently using different versions of a schema. Fortunately, * this situation is handled gracefully by virtue of schema evolution. *

* For example, imagine that a new field is added to a schema and there are two * versions of the schema. The new field is only present in the new version of * the schema. The new field must be assigned a default value in the new * schema. There are three possible cases. *

    *
  1. The writer schema and reader schema are the same. Schema evolution is * not necessary and no data transformation is applied. *

  2. *
  3. The writer schema is the old version and the reader schema is the new * version. Because the writer schema is the old version, the new field is * not present in the stored data. When a client uses the new version as a * reader schema, the new field will appear to the client as having the * default value. *

  4. *
  5. The writer schema is the new version and the reader schema is the old * version. Because the writer schema is the new version, the new field is * present in the stored data. When a client uses the old version as a * reader schema, the new field will not appear at all to the client. *
  6. *
* If instead a field were deleted from a schema, the same rules would apply * but with the roles reversed. Renaming a field is also possible by adding a * field alias to the schema; in this case the field is accessible by both the * old and new name. For more information see Schema Evolution in the Getting * Started Guide and the detailed rules for schema evolution in the Avro schema * specification. *

* To support schema evolution, be sure never to change a schema's name or * namespace. A schema is uniquely identified by its Avro full name, which is * similar to a full Java class name and consists of a combination of the Avro * schema namespace and the schema name. * *

Avro schema restrictions

* * The Avro type of a top-level schema, that is to be stored as the value in a * key-value pair, must be the Avro type record. * *

Choosing a Binding

* * The {@code AvroCatalog} provides a variety of {@link AvroBinding}s that * serialize and deserialize the Avro data format. A summary of each binding * is below. *
    *
  • {@link SpecificAvroBinding} is recommended when the schema(s) of the * object(s) in the database are known when the application is being written. * The names of the fields, and how to access them, are known at build time. * A POJO (Plain Old Java Object) class for each schema is generated using * the Avro compiler tools. The POJO classes have property getters and * setters that provide type safety. This makes the {@code * SpecificAvroBinding} the easiest of the bindings to use. *

  • *
  • {@link GenericAvroBinding} is recommended when the schema(s) of the * object(s) in the database are not known at build-time. Rather than access * the objects using predefined getters and setters, a program using {@code * GenericAvroBinding} passes in the names of the fields to a generalized * getter to retrieve data from an Avro object. For example, a generalized * NoSQL Database record browser would require this capability. *

  • *
  • {@link JsonAvroBinding} is recommended when interoperability with * other components or external systems that use JSON objects is needed. * With the {@code JsonAvroBinding}, the Jackson API is used to manipulate * JSON data objects. Note that certain Avro data types are not conveniently * represented as JSON values; see {@code JsonAvroBinding} for details. *

  • *
  • {@link RawAvroBinding} is recommended when an "escape" from the * built-in serialization provided by the other bindings is needed. The * {@code RawAvroBinding} does not perform serialization, but instead allows * specifying the Avro binary data as a byte array. Serialization can be * performed in any way desired, or not at all in the case where Avro binary * data is exchanged with other components or external systems. Because it is * low level and provides complete flexibility, the {@code RawAvroBinding} * provides the least safety and is the most difficult of the bindings to * use. *
*

* The detailed trade-offs for using each type of binding are described in * their javadoc: {@link SpecificAvroBinding}, {@link GenericAvroBinding}, * {@link JsonAvroBinding}, and {@link RawAvroBinding}. * *

Single schema and multiple schema bindings

* * Specific, generic and JSON bindings have a single schema variant ({@link * #getSpecificBinding getSpecificBinding}, {@link * #getGenericBinding getGenericBinding} and {@link #getJsonBinding * getJsonBinding}) and a multiple schema variant ({@link * #getSpecificMultiBinding getSpecificMultiBinding}, {@link * #getGenericMultiBinding getGenericMultiBinding} and {@link * #getJsonMultiBinding getJsonMultiBinding}). *

* A single schema binding provides type checking. Only values with the given * schema (or class, in the case of a specific class binding) can be used with * the binding. A single schema specific class binding provides * compile-time type checking, while a a single schema generic or JSON binding * provides run-time type checking. *

* A single schema binding is safer than a multiple schema binding and often * preferable for that reason. However, a multiple schema binding may be more * useful when retrieving key-value pairs of different types. A {@link * KVStore} method may return values of different types if the application * stores multiple types for a single key, or if a method is called that * returns multiple key-value pairs such as {@link KVStore#multiGet multiGet}, * {@link KVStore#multiGetIterator multiGetIterator}, or {@link * KVStore#storeIterator storeIterator}. There are several ways of determining * which type is returned in these cases. *

    *
  • The key in the key-value pair may indicate the value type according * to application specific knowledge of the key structure. In this case, * using a single schema binding may be appropriate. The application can * choose which binding to use based on the key structure. *

  • *
  • The schema name or a common property of the object may be used to * determine the value type. In this case a multiple schema binding can be * used to return a {@link SpecificRecord}, {@link GenericRecord} or {@link * JsonRecord}, and then the schema name or a property of the object can be * examined. *

  • *
  • For a specific binding, the class may be used to determine the value * type. In this case a multiple schema binding can be used to return the * {@link SpecificRecord}, and then {@code instanceof} can be used to * determine the concrete class. *
  • *
*

* Note that both single and multiple schema bindings perform class evolution * when deserializing a value. The deserialized value will conform to the * schema specified as an argument of the getXxxBinding or getXxxMultiBinding * method. *

* A special use case for a generic or JSON multiple schema binding is when the * application treats values dynamically based on their schema, rather than * using a fixed set of schemas that is known in advance to the client * application. In this case the {@link #getCurrentSchemas getCurrentSchemas} * method can be used to obtain a map of the most current schemas, which can * be passed to {@link #getGenericMultiBinding getGenericMultiBinding} or * {@link #getJsonMultiBinding getJsonMultiBinding}. * *

Using Schemas with Bindings

* * A client application normally embeds a copy of the schemas it uses, rather * than getting the current schemas from the store. The client's schemas are * specified when a binding is created by one of the getXxxBinding methods. * This supports schema evolution (as described above), in that the {@link * AvroBinding#toObject toObject} method will transform the serialized data * such that the returned object conforms to the schema known to the * application. *

* The application specifies its known, embedded schemas in different ways, * depending on the type of binding used. *

    *
  • If an Avro specific binding is used, the schema is specified when the * specific class is generated using the Avro compiler tools. The schema * text (in JSON format) is included in the generated code as a static String * field, and is internally available to the binding. *

  • *
  • If a generic binding or JSON binding is used, the application's * schemas must be explicitly embedded in the application. For example, the * application might maintain the text (in JSON format) of its schemas in the * application source code (in static String fields) or in a resource file * included in the application jar. To create {@link Schema} objects from * the schema text, the {@link org.apache.avro.Schema.Parser Schema.Parser} * class may be used by the application. After creating it, the schema * object is passed to the getXxxBinding method. A schema object is also * passed to the constructor of {@link GenericRecord}, {@link JsonRecord} and * {@link RawRecord}. *
  • *
*

* As described further above, all schemas used by an application must be * defined using the NoSQL Database administrative interface. If a schema * specified by the application via the client API has not been defined in the * store, an {@link UndefinedSchemaException} will be thrown by the * getXxxBinding method (if the schema is passed to this method), or by one of * the methods of the returned binding. Matching of the application specified * schemas with schemas in the store is performed using the {@link * Schema#equals} method. *

* One exception to the above is that an application may choose to use the * current version of schemas in the store that are returned by {@link * #getCurrentSchemas getCurrentSchemas}; in this case the set of schemas used * in the application need not be fixed at build time. A second exception is * when the application chooses to use a raw binding and does not serialize or * deserialize the data, for example, when the serialized byte array is copied * to or from another component or system. *

* WARNING: The application should not create new {@code Schema} objects * unnecessarily, since schema creation is an expensive operation. The * expected approach is to create each distinct {@code Schema} only once, and * reuse that object whenever it is needed. Also note that all {@code Schema} * objects created by the application and passed to an API method in this * package are cached. This cache is associated with the {@code AvroCatalog} * instance, which is associated with the {@code KVStore} instance. The cached * references to the {@code Schema} objects are not discarded until the {@code * KVStore} instance is closed and discarded. For example, a very undesirable * approach would be for the application to create a new {@code Schema} object * for each serialization or deserialization operation; in this case, * performance would suffer greatly and the cached schemas would eventually * fill the JVM heap. * * @since 2.0 * * @deprecated as of 4.0, use the table API instead. */ @Deprecated public interface AvroCatalog { /** * Returns a binding for representing values as instances of a generated * Avro specific class, for a single given class. * * @param cls an Avro specific class that was previously generated using * the Avro code generation tools. * * @return the AvroBinding that can be used for serialization and * deserialization. * * @throws UndefinedSchemaException if the schema associated with the given * class parameter has not been defined using the NoSQL Database * administration interface. * * @see SpecificAvroBinding */ public SpecificAvroBinding getSpecificBinding(Class cls); /** * Returns a binding for representing values as instances of generated Avro * specific classes, for any Avro specific class. * * @return the AvroBinding that can be used for serialization and * deserialization. * * @see SpecificAvroBinding */ public SpecificAvroBinding getSpecificMultiBinding(); /** * Returns a binding for representing a value as an Avro {@link * GenericRecord}, for values that conform to a single given expected * schema. * * @param schema the Avro schema expected for all values and {@link * GenericRecord}s used with this binding. * * @return the AvroBinding that can be used for serialization and * deserialization. * * @throws UndefinedSchemaException if the given schema has not been * defined using the NoSQL Database administration interface. */ public GenericAvroBinding getGenericBinding(Schema schema); /** * Returns a binding for representing a value as an Avro {@link * GenericRecord}, for values that conform to multiple given expected * schemas. * * @param schemas the Avro schemas expected for all values and {@link * GenericRecord}s used with this binding. The key in the map is the full * name of the schema. * * @return the AvroBinding that can be used for serialization and * deserialization. * * @throws UndefinedSchemaException if any of the given schemas has not * been defined using the NoSQL Database administration interface. */ public GenericAvroBinding getGenericMultiBinding(Map schemas); /** * Returns a binding for representing a value as a {@link JsonRecord}, for * values that conform to a single given expected schema. * * @param schema the Avro schema expected for all values and {@link * JsonRecord}s used with this binding. * * @return the AvroBinding that can be used for serialization and * deserialization. * * @throws UndefinedSchemaException if the given schema has not been * defined using the NoSQL Database administration interface. */ public JsonAvroBinding getJsonBinding(Schema schema); /** * Returns a binding for representing a value as a {@link JsonRecord}, for * values that conform to multiple given expected schemas. * * @param schemas the Avro schemas expected for all values and {@link * JsonRecord}s used with this binding. The key in the map is the full * name of the schema. * * @return the AvroBinding that can be used for serialization and * deserialization. * * @throws UndefinedSchemaException if any of the given schemas has not * been defined using the NoSQL Database administration interface. */ public JsonAvroBinding getJsonMultiBinding(Map schemas); /** * Returns a binding for representing a value as a {@link RawRecord} * containing the raw Avro serialized byte array and its associated schema. * * @return the AvroBinding that can be used for packaging and unpackaging * the serialized value. */ public RawAvroBinding getRawBinding(); /** * Returns an immutable Map containing the most current version of all * schemas from the {@link KVStore} client schema cache. The Map key is * the full name of the schema. *

* A special use case for a generic or JSON multiple schema binding is when * the application treats values dynamically based on their schema, rather * than using a fixed set of known schemas. The {@link #getCurrentSchemas * getCurrentSchemas} method can be used to obtain a map of the most * current schemas, which can be passed to {@link #getGenericMultiBinding * getGenericMultiBinding} or {@link #getJsonMultiBinding * getJsonMultiBinding}. See {@link GenericAvroBinding} and {@link * JsonAvroBinding} for an example of this use case. * * @return an immutable Map of full schema name to schema object. */ public Map getCurrentSchemas(); /** * Refreshes the cache of stored schemas, adding any new schemas or new * versions of schemas to the cache that have been stored via the * administration interface since the cache was last refreshed. *

* Calling this method is normally not necessary, since the schema cache is * automatically refreshed whenever a schema is specified via any of the * Avro binding APIs, and that schema is not already present in the cache. *

* Calling this method periodically may be necessary when the {@link * KVStore} handle is long lived, the {@link #getCurrentSchemas} method is * used to obtain current schemas, and the application wishes to obtain * schemas that were recently added using the administration interface. *

* WARNING: Calling this method often from multiple threads may cause * blocking during the query for schema changes. Also note calling this * method often could impact the performance of other operations, since it * queries kv pairs in the store. * * @param consistency determines the consistency associated with the read * used to query for new schemas. If null, the {@link * KVStoreConfig#getConsistency default consistency} is used. */ public void refreshSchemaCache(Consistency consistency); }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy