All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.sonar.l10n.py.rules.python.S6734.html Maven / Gradle / Ivy

There is a newer version: 4.23.0.17664
Show newest version

This rule raises an issue when the inplace parameter is set to True when modifying a Pandas DataFrame.

Why is this an issue?

Using inplace=True when modifying a Pandas DataFrame means that the method will modify the DataFrame in place, rather than returning a new object:

df.an_operation(inplace=True)

When inplace is False (which is the default behavior), a new object is returned instead:

df2 = df.an_operation(inplace=False)

Generally speaking, the motivation for modifying an object in place is to improve efficiency by avoiding the creation of a copy of the original object. Unfortunately, many methods supporting the inplace keyword either cannot actually be done inplace, or make a copy as a consequence of the operations they perform, regardless of whether inplace is True or not. For example, the following methods can never operate in place:

  • drop (dropping rows)
  • dropna
  • drop_duplicates
  • sort_values
  • sort_index
  • eval
  • query

Because of this, expecting efficiency gains through the use of inplace=True is not reliable.

Additionally, using inplace=True may trigger a SettingWithCopyWarning and make the overall intention of the code unclear. In the following example, modifying df2 will not modify the original df dataframe, and a warning will be raised:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame

In general, side effects such as object mutation may be the source of subtle bugs and explicit reassignment is considered safer.

When intermediate results are not needed, method chaining is a more explicit alternative to the inplace parameter. For instance, one may write:

df.drop('City', axis=1, inplace=True)
df.sort_values('Name', inplace=True)
df.reset_index(drop=True, inplace=True)

Through method chaining, this previous example may be rewritten as:

result = df.drop('City', axis=1).sort_values('Name').reset_index(drop=True)

For these reasons, it is therefore recommended to avoid using inplace=True in favor of more explicit and less error-prone alternatives.

How to fix it

To fix this issue, avoid using the inplace=True parameter. Either opt for method chaining when intermediary results are not needed, or for explicit reassignment when the intention is to perform a simple operation.

Code examples

Noncompliant code example

import pandas as pd
def foo():
    df.drop(columns='A', inplace=True)  # Noncompliant: Using inplace=True is error-prone and should be avoided

Compliant solution

import pandas as pd
def foo():
    df = df.drop(columns='A')  # OK: explicit reassignment

Resources

Documentation





© 2015 - 2024 Weber Informatics LLC | Privacy Policy