All Downloads are FREE. Search and download functionalities are using the official Maven repository.

smile.feature.package-info Maven / Gradle / Ivy

The newest version!
/*
 * Copyright (c) 2010-2021 Haifeng Li. All rights reserved.
 *
 * Smile is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Smile is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Smile.  If not, see .
 */

/**
 * Feature generation, normalization and selection.
 * 

* Feature generation (or constructive induction) studies methods that modify * or enhance the representation of data objects. Feature generation techniques * search for new features that describe the objects better than the attributes * supplied with the training instances. *

* Many machine learning methods such as Neural Networks and SVM with Gaussian * kernel also require the features properly scaled/standardized. For example, * each variable is scaled into interval [0, 1] or to have mean 0 and standard * deviation 1. Although some method such as decision trees can handle nominal * variable directly, other methods generally require nominal variables converted * to multiple binary dummy variables to indicate the presence or absence of a * characteristic. *

* Feature selection is the technique of selecting a subset of relevant * features for building robust learning models. By removing most irrelevant * and redundant features from the data, feature selection helps improve the * performance of learning models by alleviating the effect of the curse of * dimensionality, enhancing generalization capability, speeding up learning * process, etc. More importantly, feature selection also helps researchers * to acquire better understanding about the data. *

* Feature selection algorithms typically fall into two categories: feature * ranking and subset selection. Feature ranking ranks the features by a * metric and eliminates all features that do not achieve an adequate score. * Subset selection searches the set of possible features for the optimal subset. * Clearly, an exhaustive search of optimal subset is impractical if large * numbers of features are available. Commonly, heuristic methods such as * genetic algorithms are employed for subset selection. * * @author Haifeng Li */ package smile.feature;





© 2015 - 2025 Weber Informatics LLC | Privacy Policy