org.sonar.l10n.py.rules.python.S6973.html Maven / Gradle / Ivy
This rule raises an issue when a machine learning estimator or optimizer is instantiated without specifying the important hyperparameters.
Why is this an issue?
When instantiating an estimator or an optimizer, default values for any hyperparameters that are not specified will be used. Relying on the default
values can lead to non-reproducible results across different versions of the library.
Furthermore, the default values might not be the best choice for the specific problem at hand and can lead to suboptimal performance.
Here are the estimators and the parameters considered by this rule :
Scikit-learn - Estimator
Hyperparameters
AdaBoostClassifier
learning_rate
AdaBoostRegressor
learning_rate
GradientBoostingClassifier
learning_rate
GradientBoostingRegressor
learning_rate
HistGradientBoostingClassifier
learning_rate
HistGradientBoostingRegressor
learning_rate
RandomForestClassifier
min_samples_leaf, max_features
RandomForestRegressor
min_samples_leaf, max_features
ElasticNet
alpha, l1_ratio
NearestNeighbors
n_neighbors
KNeighborsClassifier
n_neighbors
KNeighborsRegressor
n_neighbors
NuSVC
nu, kernel, gamma
NuSVR
C, kernel, gamma
SVC
C, kernel, gamma
SVR
C, kernel, gamma
DecisionTreeClassifier
ccp_alpha
DecisionTreeRegressor
ccp_alpha
MLPClassifier
hidden_layer_sizes
MLPRegressor
hidden_layer_sizes
PolynomialFeatures
degree, interaction_only
PyTorch - Optimizer
Hyperparameters
Adadelta
lr, weight_decay
Adagrad
lr, weight_decay
Adam
lr, weight_decay
AdamW
lr, weight_decay
SparseAdam
lr
Adamax
lr, weight_decay
ASGD
lr, weight_decay
LBFGS
lr
NAdam
lr, weight_decay, momentum_decay
RAdam
lr, weight_decay
RMSprop
lr, weight_decay, momentum
Rprop
lr
SGD
lr, weight_decay, momentum
How to fix it in Scikit-Learn
Specify the hyperparameters when instantiating the estimator.
Code examples
Noncompliant code example
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier() # Noncompliant : n_neighbors is not specified, different values can change the behaviour of the predictor significantly
Compliant solution
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier( # Compliant
n_neighbors=5
)
How to fix it in PyTorch
Specify the hyperparameters when instantiating the optimizer
Code examples
Noncompliant code example
from my_model import model
from torch.optim import AdamW
optimizer = AdamW(model.parameters(), lr = 0.001) # Noncompliant : weight_decay is not specified, different values can change the behaviour of the optimizer significantly
Compliant solution
from my_model import model
from torch.optim import AdamW
optimizer = AdamW(model.parameters(), lr = 0.001, weight_decay = 0.003) # Compliant
Resources
Articles & blog posts
- Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Journal of
Machine Learning Research, 20(53), 1-32.
- van Rijn, J. N., & Hutter, F. (2018, July). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining (pp. 2367-2376).
Documentation
- PyTorch Documentation - torch.optim
External coding guidelines
- Code Smells for Machine Learning Applications - Hyperparameter not Explicitly Set