org.sonar.l10n.py.rules.python.S5795.html Maven / Gradle / Ivy
This rule raises an issue when the identity operator is used with cached literals.
Why is this an issue?
The identity operators is
and is not
check if the same object is on both sides, i.e. a is b
returns
True
if id(a) == id(b)
.
The CPython interpreter caches certain built-in values for integers, bytes, floats, strings, frozensets and tuples. When a value is cached, all its
references are pointing to the same object in memory; their ids are identical.
The following example illustrates this caching mechanism:
my_int = 1
other_int = 1
id(my_int) == id(other_int) # True
In both assignments (to my_int
and other_int
), the assigned value 1
comes from the interpreter cache, only
one integer object 1
is created in memory. This means both variables are referencing the same object. For this reason, their ids are
identical and my_int is other_int
evaluates to True
. This mechanism allows the interpreter for better performance, saving
memory space, by not creating new objects every time for commonly accessed values.
However this caching mechanism does not apply to every value:
my_int = 1000
id(my_int) == id(1000) # False
my_int is 1000 # False
In this example the integer 1000
is not cached. Each reference to 1000
creates an new integer object in memory with a new
id. This means that my_int is 1000
is always False
, as the two objects have different ids.
This is the reason why using the identity operators on integers, bytes, floats, strings, frozensets and tuples is unreliable as the behavior
changes depending on the value.
Moreover the caching behavior is not part of the Python language specification and could vary between interpreters. CPython 3.8 warns about comparing literals using identity operators.
This rule raises an issue when at least one operand of an identity operator:
- is of type
int
, bytes
, float
, frozenset
or tuple
.
- is a string literal.
If you need to compare these types you should use the equality operators instead ==
or !=
.
Exceptions
The only case where the is
operator could be used with a cached type is with "interned" strings. The Python interpreter provides a way
to explicitly cache any string literals and benefit from improved performances, such as:
- saved memory space.
- faster string comparison: as only their memory address need to be compared.
- faster dictionary lookup: if the dictionary keys are interned, the lookup can be done by comparing memory address as well.
This explicit caching is done through interned strings (i.e. sys.intern("some string")
).
from sys import intern
my_text = "text"
intern("text") is intern(my_text) # True
Note however that interned strings don’t necessarily have the same identity as string literals.
It is also important to note that interned strings may be garbage collected, so in order to benefit from their caching mechanism, a reference to
the interned string should be kept.
How to fix it
Use the equality operators (==
or !=
) to compare int
, bytes
, float
,
frozenset
, tuple
and string literals.
Code examples
Noncompliant code example
my_int = 2000
my_int is 2000 # Noncompliant: the integer 2000 may not be cached, the identity operator could return False.
() is tuple() # Noncompliant: this will return True only because the CPython interpreter cached the empty tuple.
(1,) is tuple([1]) # Noncompliant: comparing non empty tuples will return False as none of these objects are cached.
Compliant solution
my_int = 2000
my_int == 2000
() == tuple()
(1,) == tuple([1])
Resources
Documentation
- Python Documentation - Changes in Python behaviour.
- Python Documentation - sys.intern
Articles & blog posts
- Adam Johnson’s Blog - Why does Python 3.8 log a
SyntaxWarning for 'is' with literals?
- Trey Hunner’s Blog - Equality vs
identity