All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.lucene.codecs.lucene60.Lucene60PointsFormat Maven / Gradle / Ivy

There is a newer version: 4.0.0
Show newest version
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.lucene.codecs.lucene60;


import java.io.IOException;

import org.apache.lucene.codecs.CodecUtil;
import org.apache.lucene.codecs.PointsFormat;
import org.apache.lucene.codecs.PointsReader;
import org.apache.lucene.codecs.PointsWriter;
import org.apache.lucene.index.SegmentReadState;
import org.apache.lucene.index.SegmentWriteState;

/**
 * Lucene 6.0 point format, which encodes dimensional values in a block KD-tree structure
 * for fast 1D range and N dimesional shape intersection filtering.
 * See this paper for details.
 *
 * 

This data structure is written as a series of blocks on disk, with an in-memory perfectly balanced * binary tree of split values referencing those blocks at the leaves. * *

The .dim file has both blocks and the index split * values, for each field. The file starts with {@link CodecUtil#writeIndexHeader}. * *

The blocks are written like this: * *

    *
  • count (vInt) *
  • delta-docID (vInt) count (delta coded docIDs, in sorted order) *
  • packedValuecount (the byte[] value of each dimension packed into a single byte[]) *
* *

After all blocks for a field are written, then the index is written: *

    *
  • numDims (vInt) *
  • maxPointsInLeafNode (vInt) *
  • bytesPerDim (vInt) *
  • count (vInt) *
  • packed index (byte[]) *
* *

The packed index uses hierarchical delta and prefix coding to compactly encode the file pointer for * all leaf blocks, once the tree is traversed, as well as the split dimension and split value for each * inner node of the tree. * *

After all fields blocks + index data are written, {@link CodecUtil#writeFooter} writes the checksum. * *

The .dii file records the file pointer in the .dim file where each field's * index data was written. It starts with {@link CodecUtil#writeIndexHeader}, then has: * *

    *
  • fieldCount (vInt) *
  • (fieldNumber (vInt), fieldFilePointer (vLong))fieldCount *
* *

After all fields blocks + index data are written, {@link CodecUtil#writeFooter} writes the checksum. * * @lucene.experimental */ public final class Lucene60PointsFormat extends PointsFormat { static final String DATA_CODEC_NAME = "Lucene60PointsFormatData"; static final String META_CODEC_NAME = "Lucene60PointsFormatMeta"; /** * Filename extension for the leaf blocks */ public static final String DATA_EXTENSION = "dim"; /** * Filename extension for the index per field */ public static final String INDEX_EXTENSION = "dii"; static final int DATA_VERSION_START = 0; static final int DATA_VERSION_CURRENT = DATA_VERSION_START; static final int INDEX_VERSION_START = 0; static final int INDEX_VERSION_CURRENT = INDEX_VERSION_START; /** Sole constructor */ public Lucene60PointsFormat() { } @Override public PointsWriter fieldsWriter(SegmentWriteState state) throws IOException { return new Lucene60PointsWriter(state); } @Override public PointsReader fieldsReader(SegmentReadState state) throws IOException { return new Lucene60PointsReader(state); } }





© 2015 - 2025 Weber Informatics LLC | Privacy Policy