smile.feature.extraction.package-info Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of smile-core Show documentation

smile-core

There is a newer version: 4.0.0

/* * Copyright (c) 2010-2021 Haifeng Li. All rights reserved. * * Smile is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * Smile is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with Smile. If not, see . */ /** * Feature extraction. Feature extraction transforms the data in the * high-dimensional space to a space of fewer dimensions. The data * transformation may be linear, as in principal component analysis (PCA), * but many nonlinear dimensionality reduction techniques also exist. * * The main linear technique for dimensionality reduction, principal component * analysis, performs a linear mapping of the data to a lower dimensional * space in such a way that the variance of the data in the low-dimensional * representation is maximized. In practice, the correlation matrix of the * data is constructed and the eigenvectors on this matrix are computed. * The eigenvectors that correspond to the largest eigenvalues (the principal * components) can now be used to reconstruct a large fraction of the variance * of the original data. Moreover, the first few eigenvectors can often be * interpreted in terms of the large-scale physical behavior of the system. * The original space has been reduced (with data loss, but hopefully * retaining the most important variance) to the space spanned by a few * eigenvectors. * * Compared to regular batch PCA algorithm, the generalized Hebbian algorithm * is an adaptive method to find the largest k eigenvectors of the covariance * matrix, assuming that the associated eigenvalues are distinct. GHA works * with an arbitrarily large sample size and the storage requirement is modest. * Another attractive feature is that, in a non-stationary environment, it * has an inherent ability to track gradual changes in the optimal solution * in an inexpensive way. * * Random projection is a promising linear dimensionality reduction technique * for learning mixtures of Gaussians. The key idea of random projection arises * from the Johnson-Lindenstrauss lemma: if points in a vector space are * projected onto a randomly selected subspace of suitably high dimension, * then the distances between the points are approximately preserved. * * Principal component analysis can be employed in a nonlinear way by means * of the kernel trick. The resulting technique is capable of constructing * nonlinear mappings that maximize the variance in the data. The resulting * technique is entitled Kernel PCA. Other prominent nonlinear techniques * include manifold learning techniques such as locally linear embedding * (LLE), Hessian LLE, Laplacian eigenmaps, and LTSA. These techniques * construct a low-dimensional data representation using a cost function * that retains local properties of the data, and can be viewed as defining * a graph-based kernel for Kernel PCA. More recently, techniques have been * proposed that, instead of defining a fixed kernel, try to learn the kernel * using semidefinite programming. The most prominent example of such a * technique is maximum variance unfolding (MVU). The central idea of MVU * is to exactly preserve all pairwise distances between nearest neighbors * (in the inner product space), while maximizing the distances between points * that are not nearest neighbors. * * An alternative approach to neighborhood preservation is through the * minimization of a cost function that measures differences between * distances in the input and output spaces. Important examples of such * techniques include classical multidimensional scaling (which is identical * to PCA), Isomap (which uses geodesic distances in the data space), diffusion * maps (which uses diffusion distances in the data space), t-SNE (which * minimizes the divergence between distributions over pairs of points), * and curvilinear component analysis. *

* A different approach to nonlinear dimensionality reduction is through the * use of autoencoders, a special kind of feed-forward neural networks with * a bottle-neck hidden layer. The training of deep encoders is typically * performed using a greedy layer-wise pre-training (e.g., using a stack of * Restricted Boltzmann machines) that is followed by a fine tuning stage based * on backpropagation. * * @author Haifeng Li */ package smile.feature.extraction;