org.sonar.l10n.py.rules.python.S6971.html Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of python-checks Show documentation

There is a newer version: 4.26.0.19456

This rule raises an issue when trying to access a Scikit-Learn transformer used in a pipeline with caching directly.
Why is this an issue?
When using a pipeline with a cache and passing the transformer objects as an instance from a variable, it is possible to access the transformer
objects directly.
This is an issue since all the transformers are cloned when the Pipeline is fitted, and therefore, the objects outside the Pipeline are not updated
and will yield unexpected results.
How to fix it
Replace the direct access to the transformer with an access to the named_steps attribute of the pipeline.
Code examples
Noncompliant code example
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import RobustScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline

diabetes = load_diabetes()
scaler = RobustScaler()
knn = KNeighborsRegressor(n_neighbors=5)

pipeline = Pipeline([
    ('scaler', scaler),
    ('knn', knn)
], memory="cache")

pipeline.fit(diabetes.data, diabetes.target)
print(scaler.center_) # Noncompliant : raises an AttributeError

Compliant solution
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import RobustScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline

diabetes = load_diabetes()
scaler = RobustScaler()
knn = KNeighborsRegressor(n_neighbors=5)

pipeline = Pipeline([
    ('scaler', scaler),
    ('knn', knn)
], memory="cache")

pipeline.fit(diabetes.data, diabetes.target)
print(pipeline.named_steps['scaler'].center_) # Compliant

Resources
Documentation

   Scikit-Learn - Pipelines and composite estimators : Side effect of caching transformers