All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.sonar.l10n.py.rules.python.S6741.html Maven / Gradle / Ivy

There is a newer version: 4.23.0.17664
Show newest version

This rule raises an issue when the pandas.DataFrame.values is used instead of the pandas.DataFrame.to_numpy() method.

Why is this an issue?

The values attribute and the to_numpy() method in pandas both provide a way to return a NumPy representation of the DataFrame. However, there are some reasons why the to_numpy() method is recommended over the values attribute:

  • Future Compatibility: The values attribute is considered a legacy feature, while the to_numpy() is the recommended method to extract data and is considered more future-proof.
  • Data type consistency: If the DataFrame has columns with different data types, NumPy will choose a common data type that can hold all the data. This may lead to loss of information, unexpected type conversions, or increased memory usage. The to_numpy() allows you to select the common type manually, passing the dtype argument.
  • View vs Copy: The values attribute can return a view or a copy of the data depending on whether the data needs to be transposed. This can lead to confusion when modifying the extracted data. On the other hand, to_numpy() has copy argument allowing to force it always to return a new NumPy array, ensuring that any changes you make won’t affect the original DataFrame.
  • Missing values control: The to_numpy() allows to specify the default value used for missing values in the DataFrame, while the values will always use numpy.nan for missing values.

How to fix it

Use the to_numpy() method instead of the values attribute to get a NumPy representation of the DataFrame.

Code examples

Noncompliant code example

import pandas as pd

df = pd.DataFrame({
        'X': ['A', 'B', 'A', 'C'],
        'Y': [10, 7, 12, 5]
    })

arr = df.values # Noncompliant: using the 'values' attribute is not recommended

Compliant solution

import pandas as pd

df = pd.DataFrame({
        'X': ['A', 'B', 'A', 'C'],
        'Y': [10, 7, 12, 5]
    })

arr = df.to_numpy() # Compliant

Resources

Documentation





© 2015 - 2024 Weber Informatics LLC | Privacy Policy