All Downloads are FREE. Search and download functionalities are using the official Maven repository.

smile.neighbor.package-info Maven / Gradle / Ivy

The newest version!
/*******************************************************************************
 * Copyright (c) 2010 Haifeng Li
 *   
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *  
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *******************************************************************************/

/**
 * Nearest neighbor search. Nearest neighbor search is an optimization problem
 * for finding closest points in metric spaces. The problem is: given a set S
 * of points in a metric space M and a query point q ∈ M, find the closest
 * point in S to q.
 * 

* The simplest solution to the NNS problem is to compute the distance from * the query point to every other point in the database, keeping track of * the "best so far". This algorithm, sometimes referred to as the naive * approach, has a running time of O(Nd) where N is the cardinality of S * and d is the dimensionality of M. There are no search data structures * to maintain, so linear search has no space complexity beyond the storage * of the database. Surprisingly, naive search outperforms space partitioning * approaches on higher dimensional spaces. *

* Space partitioning, a branch and bound methodology, has been applied to the * problem. The simplest is the k-d tree, which iteratively bisects the search * space into two regions containing half of the points of the parent region. * Queries are performed via traversal of the tree from the root to a leaf by * evaluating the query point at each split. Depending on the distance * specified in the query, neighboring branches that might contain hits * may also need to be evaluated. For constant dimension query time, * average complexity is O(log N) in the case of randomly distributed points. * Alternatively the R-tree data structure was designed to support nearest * neighbor search in dynamic context, as it has efficient algorithms for * insertions and deletions. *

* In case of general metric space branch and bound approach is known under * the name of metric trees. Particular examples include vp-tree and BK-tree. *

* Locality sensitive hashing (LSH) is a technique for grouping points in space * into 'buckets' based on some distance metric operating on the points. * Points that are close to each other under the chosen metric are mapped * to the same bucket with high probability. *

* The cover tree has a theoretical bound that is based on the dataset's * doubling constant. The bound on search time is O(c12 log n) where c is * the expansion constant of the dataset. * * @author Haifeng Li */ package smile.neighbor;





© 2015 - 2025 Weber Informatics LLC | Privacy Policy