org.sonar.l10n.py.rules.python.S6973.html Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of python-checks Show documentation

There is a newer version: 4.26.0.19456

This rule raises an issue when a machine learning estimator or optimizer is instantiated without specifying the important hyperparameters.
Why is this an issue?
When instantiating an estimator or an optimizer, default values for any hyperparameters that are not specified will be used. Relying on the default
values can lead to non-reproducible results across different versions of the library.
Furthermore, the default values might not be the best choice for the specific problem at hand and can lead to suboptimal performance.
Here are the estimators and the parameters considered by this rule :

  
    
    
  
  
    
      Scikit-learn - Estimator
      Hyperparameters
    
  
  
    
      AdaBoostClassifier
      learning_rate
    
    
      AdaBoostRegressor
      learning_rate
    
    
      GradientBoostingClassifier
      learning_rate
    
    
      GradientBoostingRegressor
      learning_rate
    
    
      HistGradientBoostingClassifier
      learning_rate
    
    
      HistGradientBoostingRegressor
      learning_rate
    
    
      RandomForestClassifier
      min_samples_leaf, max_features
    
    
      RandomForestRegressor
      min_samples_leaf, max_features
    
    
      ElasticNet
      alpha, l1_ratio
    
    
      NearestNeighbors
      n_neighbors
    
    
      KNeighborsClassifier
      n_neighbors
    
    
      KNeighborsRegressor
      n_neighbors
    
    
      NuSVC
      nu, kernel, gamma
    
    
      NuSVR
      C, kernel, gamma
    
    
      SVC
      C, kernel, gamma
    
    
      SVR
      C, kernel, gamma
    
    
      DecisionTreeClassifier
      ccp_alpha
    
    
      DecisionTreeRegressor
      ccp_alpha
    
    
      MLPClassifier
      hidden_layer_sizes
    
    
      MLPRegressor
      hidden_layer_sizes
    
    
      PolynomialFeatures
      degree, interaction_only
    
  


  
    
    
  
  
    
      PyTorch - Optimizer
      Hyperparameters
    
  
  
    
      Adadelta
      lr, weight_decay
    
    
      Adagrad
      lr, weight_decay
    
    
      Adam
      lr, weight_decay
    
    
      AdamW
      lr, weight_decay
    
    
      SparseAdam
      lr
    
    
      Adamax
      lr, weight_decay
    
    
      ASGD
      lr, weight_decay
    
    
      LBFGS
      lr
    
    
      NAdam
      lr, weight_decay, momentum_decay
    
    
      RAdam
      lr, weight_decay
    
    
      RMSprop
      lr, weight_decay, momentum
    
    
      Rprop
      lr
    
    
      SGD
      lr, weight_decay, momentum
    
  

How to fix it in Scikit-Learn
Specify the hyperparameters when instantiating the estimator.
Code examples
Noncompliant code example
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier() # Noncompliant : n_neighbors is not specified, different values can change the behaviour of the predictor significantly

Compliant solution
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier( # Compliant
    n_neighbors=5
)

How to fix it in PyTorch
Specify the hyperparameters when instantiating the optimizer
Code examples
Noncompliant code example
from my_model import model
from torch.optim import AdamW

optimizer = AdamW(model.parameters(), lr = 0.001) # Noncompliant : weight_decay is not specified, different values can change the behaviour of the optimizer significantly

Compliant solution
from my_model import model
from torch.optim import AdamW

optimizer = AdamW(model.parameters(), lr = 0.001, weight_decay = 0.003) # Compliant

Resources
Articles & blog posts

   Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Journal of
  Machine Learning Research, 20(53), 1-32. 
   van Rijn, J. N., & Hutter, F. (2018, July). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International
  Conference on Knowledge Discovery & Data Mining (pp. 2367-2376). 

Documentation

   PyTorch Documentation - torch.optim 

External coding guidelines

   Code Smells for Machine Learning Applications - Hyperparameter not Explicitly Set

Scikit-learn - Estimator	Hyperparameters
AdaBoostClassifier	learning_rate
AdaBoostRegressor	learning_rate
GradientBoostingClassifier	learning_rate
GradientBoostingRegressor	learning_rate
HistGradientBoostingClassifier	learning_rate
HistGradientBoostingRegressor	learning_rate
RandomForestClassifier	min_samples_leaf, max_features
RandomForestRegressor	min_samples_leaf, max_features
ElasticNet	alpha, l1_ratio
NearestNeighbors	n_neighbors
KNeighborsClassifier	n_neighbors
KNeighborsRegressor	n_neighbors
NuSVC	nu, kernel, gamma
NuSVR	C, kernel, gamma
SVC	C, kernel, gamma
SVR	C, kernel, gamma
DecisionTreeClassifier	ccp_alpha
DecisionTreeRegressor	ccp_alpha
MLPClassifier	hidden_layer_sizes
MLPRegressor	hidden_layer_sizes
PolynomialFeatures	degree, interaction_only

PyTorch - Optimizer	Hyperparameters
Adadelta	lr, weight_decay
Adagrad	lr, weight_decay
Adam	lr, weight_decay
AdamW	lr, weight_decay
SparseAdam	lr
Adamax	lr, weight_decay
ASGD	lr, weight_decay
LBFGS	lr
NAdam	lr, weight_decay, momentum_decay
RAdam	lr, weight_decay
RMSprop	lr, weight_decay, momentum
Rprop	lr
SGD	lr, weight_decay, momentum