The edge cases 2to3 might not fix
21 Dec 2020A year after Python 2 was officially deprecated, 2to3
is still my favourite tool for porting Python 2 code to Python 3.
Only recently, when using it on a legacy code base, I found one of the edge cases 2to3
will not fix for you.
Consider this function in Python, left completely untouched by running 2to3
.
It worked fine in Python 2, but throws RecursionError
in Python 3.
(It is of questionable quality; I didn’t make it originally).
def safe_escape(value):
if isinstance(value, dict):
value = OrderedDict([
(safe_escape(k), safe_escape(v)) for k, v in value.items()
])
elif hasattr(value, '__iter__'):
value = [safe_escape(v) for v in value]
elif isinstance(value, str):
value = value.replace('<', '%3C')
value = value.replace('>', '%3E')
value = value.replace('"', '%22')
value = value.replace("'", '%27')
return value
But why? It turns out strings in Python 2 don’t have the __iter__
method, but they do in Python 3.
What happens in Python 3 is that the hasattr(value, '__iter__')
condition becomes true, when value
is a string.
It now iterates over each character in every string in the list comprehension, and calls itself (the recursion part).
But… each of those strings (characters) also has the __iter__
attribute, quickly reaching the max recursion depth set by your Python interpreter.
In this function it was easy to fix of course:
- Either the order of the two
elif
s can be swapped - or we exclude strings from the iter-check (
elif hasattr(value, '__iter__') and not isinstance(value, str)
)
The more labour-intensive way of fixing it would be rewriting it entirely, since the only thing it actually really does is recursively URL encoding (but for four characters only). Maybe there’s a (bad) reason it only URL encodes these four characters, so that was a can of worms I didn’t want to open.
Anyway, main lesson for me was: even though Python 2 is gone, you might still need to remember its quirks.