org.apache.lucene.util.encoding.package.html Maven / Gradle / Ivy
Show all versions of lucene-facet Show documentation
Encoding
Offers various encoders and decoders for integers, as well as the
mechanisms to create new ones. The super class for all encoders is
{@link org.apache.lucene.util.encoding.IntEncoder} and for most of the
encoders there is a matching {@link
org.apache.lucene.util.encoding.IntDecoder} implementation (not all
encoders need a decoder).
An encoder encodes the integers that are passed to {@link
org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} into a
set output stream (see {@link
org.apache.lucene.util.encoding.IntEncoder#reInit(OutputStream)
reInit}). One should always call {@link
org.apache.lucene.util.encoding.IntEncoder#close() close} when all
integers have been encoded, to ensure proper finish by the encoder. Some
encoders buffer values in-memory and encode in batches in order to
optimize the encoding, and not closing them may result in loss of
information or corrupt stream.
A proper and typical usage of an encoder looks like this:
int[] data = <the values to encode>
IntEncoder encoder = new VInt8IntEncoder();
OutputStream out = new ByteArrayOutputStream();
encoder.reInit(out);
for (int val : data) {
encoder.encode(val);
}
encoder.close();
// Print the bytes in binary
byte[] bytes = out.toByteArray();
for (byte b : bytes) {
System.out.println(Integer.toBinaryString(b));
}
Each encoder also implements {@link
org.apache.lucene.util.encoding.IntEncoder#createMatchingDecoder()
createMatchingDecoder} which returns the matching decoder for this encoder.
As mentioned above, not all encoders have a matching decoder (like some
encoder filters which are explained next), however every encoder should
return a decoder following a call to that method. To complete the
example above, one can easily iterate over the decoded values like this:
IntDecoder d = e.createMatchingDecoder();
d.reInit(new ByteArrayInputStream(bytes));
long val;
while ((val = d.decode()) != IntDecoder.EOS) {
System.out.println(val);
}
Some encoders don't perform any encoding at all, or do not include an
encoding logic. Those are called {@link
org.apache.lucene.util.encoding.IntEncoderFilter}s. A filter is an
encoder which delegates the encoding task to a given encoder, however
performs additional logic before the values are sent for encoding. An
example is {@link org.apache.lucene.util.encoding.DGapIntEncoder}
which encodes the gaps between values rather than the values themselves.
Another example is {@link
org.apache.lucene.util.encoding.SortingIntEncoder} which sorts all the
values in ascending order before they are sent for encoding. This
encoder aggregates the values in its {@link
org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} implementation
and decoding only happens upon calling {@link
org.apache.lucene.util.encoding.IntEncoder#close() close}.
Extending IntEncoder
Extending {@link org.apache.lucene.util.encoding.IntEncoder} is a very
easy task. One only needs to implement {@link
org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} and
{@link org.apache.lucene.util.encoding.IntEncoder#createMatchingDecoder()
createMatchingDecoder} as the base implementation takes care of
re-initializing the output stream and closing it. The following example
illustrates how can one write an encoder (and a matching decoder) which
'tags' the stream with type/ID of the encoder. Such tagging is important
in scenarios where an application uses different encoders for different
streams, and wants to manage some sort of mapping between an encoder ID
to an IntEncoder/Decoder implementation, so a proper decoder will be
initialized on the fly:
public class TaggingIntEncoder extends IntEncoderFilter {
public TaggingIntEncoder(IntEncoder encoder) {
super(encoder);
}
@Override
public void encode(int value) throws IOException {
encoder.encode(value);
}
@Override
public IntDecoder createMatchingDecoder() {
return new TaggingIntDecoder();
}
@Override
public void reInit(OutputStream out) {
super.reInit(os);
// Assumes the application has a static EncodersMap class which is able to
// return a unique ID for a given encoder.
int encoderID = EncodersMap.getID(encoder);
this.out.write(encoderID);
}
@Override
public String toString() {
return "Tagging (" + encoder.toString() + ")";
}
}
And the matching decoder:
public class TaggingIntDecoder extends IntDecoder {
// Will be initialized upon calling reInit.
private IntDecoder decoder;
@Override
public void reInit(InputStream in) {
super.reInit(in);
// Read the ID of the encoder that tagged this stream.
int encoderID = in.read();
// Assumes EncodersMap can return the proper IntEncoder given the ID.
decoder = EncodersMap.getEncoder(encoderID).createMatchingDecoder();
}
@Override
public long decode() throws IOException {
return decoder.decode();
}
@Override
public String toString() {
return "Tagging (" + decoder == null ? "none" : decoder.toString() + ")";
}
}
The example implements TaggingIntEncoder
as a filter over another
encoder. Even though it does not do any filtering on the actual values, it feels
right to present it as a filter. Anyway, this is just an example code and one
can choose to implement it however it makes sense to the application. For
simplicity, error checking was omitted from the sample code.