org.sonar.l10n.py.rules.python.S6734.html Maven / Gradle / Ivy
This rule raises an issue when the inplace
parameter is set to True
when modifying a Pandas DataFrame.
Why is this an issue?
Using inplace=True
when modifying a Pandas DataFrame means that the method will modify the DataFrame in place, rather than returning a
new object:
df.an_operation(inplace=True)
When inplace
is False
(which is the default behavior), a new object is returned instead:
df2 = df.an_operation(inplace=False)
Generally speaking, the motivation for modifying an object in place is to improve efficiency by avoiding the creation of a copy of the original
object. Unfortunately, many methods supporting the inplace keyword either cannot actually be done inplace, or make a copy as a consequence of the
operations they perform, regardless of whether inplace
is True
or not. For example, the following methods can never operate
in place:
- drop (dropping rows)
- dropna
- drop_duplicates
- sort_values
- sort_index
- eval
- query
Because of this, expecting efficiency gains through the use of inplace=True
is not reliable.
Additionally, using inplace=True
may trigger a SettingWithCopyWarning
and make the overall intention of the code unclear.
In the following example, modifying df2
will not modify the original df
dataframe, and a warning will be raised:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
In general, side effects such as object mutation may be the source of subtle bugs and explicit reassignment is considered safer.
When intermediate results are not needed, method chaining is a more explicit alternative to the inplace
parameter. For instance, one
may write:
df.drop('City', axis=1, inplace=True)
df.sort_values('Name', inplace=True)
df.reset_index(drop=True, inplace=True)
Through method chaining, this previous example may be rewritten as:
result = df.drop('City', axis=1).sort_values('Name').reset_index(drop=True)
For these reasons, it is therefore recommended to avoid using inplace=True
in favor of more explicit and less error-prone
alternatives.
How to fix it
To fix this issue, avoid using the inplace=True
parameter. Either opt for method chaining when intermediary results are not needed, or
for explicit reassignment when the intention is to perform a simple operation.
Code examples
Noncompliant code example
import pandas as pd
def foo():
df.drop(columns='A', inplace=True) # Noncompliant: Using inplace=True is error-prone and should be avoided
Compliant solution
import pandas as pd
def foo():
df = df.drop(columns='A') # OK: explicit reassignment
Resources
Documentation
- Pandas Enhancement Proposal - PDEP-8: In-place methods in pandas