org.sonar.l10n.py.rules.python.S5868.html Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of python-checks Show documentation

There is a newer version: 4.26.0.19456

Why is this an issue?
When placing Unicode Grapheme Clusters (characters which require to be encoded in
multiple Code Points) inside a character class of a regular expression, this will likely lead
to unintended behavior.
For instance, the grapheme cluster c̈ requires two code points: one for 'c', followed by one for the umlaut
modifier '\u{0308}'. If placed within a character class, such as [c̈], the regex will consider the character class being the
enumeration [c\u{0308}] instead. It will, therefore, match every 'c' and every umlaut that isn’t expressed as a
single codepoint, which is extremely unlikely to be the intended behavior.
This rule raises an issue every time Unicode Grapheme Clusters are used within a character class of a regular expression.
Noncompliant code example
re.sub(r"[c̈d̈]", "X", "cc̈d̈d") # Noncompliant, print "XXXXXX" instead of expected "cXXd".

Compliant solution
re.sub(r"c̈|d̈", "X", "cc̈d̈d") # print "cXXd"