docs.org.apache.nifi.processors.kite.ConvertAvroSchema.additionalDetails.html Maven / Gradle / Ivy
ConvertAvroSchema
Description:
This processor is used to convert data between two Avro formats, such as those coming from the ConvertCSVToAvro
or
ConvertJSONToAvro
processors. The input and output content of the flow files should be Avro data files. The processor
includes support for the following basic type conversions:
- Anything to String, using the data's default String representation
- String types to numeric types int, long, double, and float
- Conversion to and from optional Avro types
In addition, fields can be renamed or unpacked from a record type by using the dynamic properties.
Mapping Example:
Throughout this example, we will refer to input data with the following schema:
{
"type": "record",
"name": "CustomerInput",
"namespace": "org.apache.example",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "companyName",
"type": ["null", "string"],
"default": null
},
{
"name": "revenue",
"type": ["null", "string"],
"default": null
},
{
"name" : "parent",
"type" : [ "null", {
"type" : "record",
"name" : "parent",
"fields" : [ {
"name" : "name",
"type" : ["null", "string"],
"default" : null
}, {
"name" : "id",
"type" : "string"
} ]
} ],
"default" : null
}
]
}
Where even though the revenue and id fields are mapped as string, they are logically long and double respectively.
By default, fields with matching names will be mapped automatically, so the following output schema could be converted
without using dynamic properties:
{
"type": "record",
"name": "SimpleCustomerOutput",
"namespace": "org.apache.example",
"fields": [
{
"name": "id",
"type": "long"
},
{
"name": "companyName",
"type": ["null", "string"],
"default": null
},
{
"name": "revenue",
"type": ["null", "double"],
"default": null
}
]
}
To rename companyName to name and to extract the parent's id field, both a schema and a dynamic properties must be provided.
For example, to convert to the following schema:
{
"type": "record",
"name": "SimpleCustomerOutput",
"namespace": "org.apache.example",
"fields": [
{
"name": "id",
"type": "long"
},
{
"name": "name",
"type": ["null", "string"],
"default": null
},
{
"name": "revenue",
"type": ["null", "double"],
"default": null
},
{
"name": "parentId",
"type": ["null", "long"],
"default": null
}
]
}
The following dynamic properties would be used:
"companyName" -> "name"
"parent.id" -> "parentId"