smile.feature.importance.SHAP Maven / Gradle / Ivy
/*
* Copyright (c) 2010-2021 Haifeng Li. All rights reserved.
*
* Smile is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Smile is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Smile. If not, see .
*/
package smile.feature.importance;
import java.util.stream.Stream;
/**
* SHAP (SHapley Additive exPlanations) is a game theoretic approach to
* explain the output of any machine learning model. It connects optimal
* credit allocation with local explanations using the classic Shapley
* values from game theory.
*
* SHAP leverages local methods designed to explain a prediction
* f(x)
based on a single input x
.
* The local methods are defined as any interpretable approximation
* of the original model. In particular, SHAP employs additive feature
* attribution methods.
*
* SHAP values attribute to each feature the change in the expected
* model prediction when conditioning on that feature. They explain
* how to get from the base value E[f(z)]
that would be
* predicted if we did not know any features to the current output
* f(x)
.
*
* In game theory, the Shapley value is the average expected marginal
* contribution of one player after all possible combinations have
* been considered.
*
*
References
*
* - Lundberg, Scott M., and Su-In Lee. A unified approach to interpreting model predictions. NIPS, 2017.
* - Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. Consistent individualized feature attribution for tree ensembles.
*
*
* @author Haifeng Li
*/
public interface SHAP {
/**
* Returns the SHAP values. For regression, the length of SHAP values
* is same as the number of features. For classification, SHAP values
* are of p x k
, where p
is the number of
* features and k
is the classes. The first k elements are
* the SHAP values of first feature over k classes, respectively. The
* rest features follow accordingly.
*
* @param x an instance.
* @return the SHAP values.
*/
double[] shap(T x);
/**
* Returns the average of absolute SHAP values over a data set.
* @param data the data set.
* @return the average of absolute SHAP values over a data set.
*/
default double[] shap(Stream data) {
return smile.math.MathEx.colMeans(
data.map(x -> {
double[] values = shap(x);
for (int i = 0; i < values.length; i++)
values[i] = Math.abs(values[i]);
return values;
}).toArray(double[][]::new)
);
}
}