smile.validation.package-info Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of smile-core Show documentation

smile-core

The newest version!

/* * Copyright (c) 2010-2021 Haifeng Li. All rights reserved. * * Smile is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * Smile is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with Smile. If not, see . */ /** * Model validation and selection. * * Model validation is the task of confirming that the outputs of a statistical * model are acceptable with respect to the real data-generating process. * A model can be validated only relative to some application area. A model * that is valid for one application might be invalid for some other * applications. * * Model validation can be based on two types of data: data that was used * in the construction of the model and data that was not used in the * construction. Validation based on the first type usually involves * analyzing the goodness of fit of the model or analyzing whether the * residuals seem to be random (i.e. residual diagnostics). Validation * based on only the first type is often inadequate. * Validation based on the second type usually involves analyzing whether * the model's predictive performance deteriorates non-negligibly when * applied to pertinent new data. * * Model selection is the task of selecting a statistical model from * a set of candidate models, given data. In the simplest cases, * a pre-existing set of data is considered. However, the task can also * involve the design of experiments such that the data collected is * well-suited to the problem of model selection. *

* Once the set of candidate models has been chosen, the statistical analysis * allows us to select the best of these models. What is meant by best is * controversial. A good model selection technique will balance goodness * of fit with simplicity. More complex models will be better able to adapt * their shape to fit the data, but the additional parameters may not * represent anything useful. Goodness of fit is generally determined * using a likelihood ratio approach, or an approximation of this, * leading to a chi-squared test. The complexity is generally measured * by counting the number of parameters in the model. Given candidate models * of similar predictive or explanatory power, the simplest model is most * likely to be the best choice (Occam's razor). * * @author Haifeng Li */ package smile.validation;