org.sonar.l10n.py.rules.python.S6972.html Maven / Gradle / Ivy
This rule raises an issue when an invalid nested estimator parameter is set on a Pipeline.
Why is this an issue?
In the sklearn library, when using the Pipeline
class, it is possible to modify the parameters of the nested estimators. This
modification can be done by using the Pipeline
method set_params
and specifying the name of the estimator and the parameter
to update separated by a double underscore __
.
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
pipe = Pipeline(steps=[("clf", SVC())])
pipe.set_params(clf__C=10)
In the example above, the regularization parameter C
is set to the value 10
for the classifier called clf
.
Setting such parameters can be done as well with the help of the param_grid
parameter for example when using
GridSearchCV
.
Providing invalid parameters that do not exist on the estimator can lead to unexpected behavior or runtime errors.
This rule checks that the parameters provided to the set_params
method of a Pipeline instance or through the param_grid
parameters of a GridSearchCV
are valid for the nested estimators.
How to fix it
To fix this issue provide valid parameters to the nested estimators.
Code examples
Noncompliant code example
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
pipe = Pipeline(steps=[('reduce_dim', PCA())])
pipe.set_params(reduce_dim__C=2) # Noncompliant: the parameter C does not exists for the PCA estimator
Compliant solution
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
pipe = Pipeline(steps=[('reduce_dim', PCA())])
pipe.set_params(reduce_dim__n_components=2) # Compliant
Resources
Documentation
- Scikit-Learn documentation - Access to nested
parameters
- Scikit-Learn documentation - GridSearchCV
reference