The edge cases 2to3 might not fix21 Dec 2020
A year after Python 2 was officially deprecated,
2to3 is still my favourite tool for porting Python 2 code to Python 3.
Only recently, when using it on a legacy code base, I found one of the edge cases
2to3 will not fix for you.
Consider this function in Python, left completely untouched by running
It worked fine in Python 2, but throws
RecursionError in Python 3.
(It is of questionable quality; I didn’t make it originally).
def safe_escape(value): if isinstance(value, dict): value = OrderedDict([ (safe_escape(k), safe_escape(v)) for k, v in value.items() ]) elif hasattr(value, '__iter__'): value = [safe_escape(v) for v in value] elif isinstance(value, str): value = value.replace('<', '%3C') value = value.replace('>', '%3E') value = value.replace('"', '%22') value = value.replace("'", '%27') return value
But why? It turns out strings in Python 2 don’t have the
__iter__ method, but they do in Python 3.
What happens in Python 3 is that the
hasattr(value, '__iter__') condition becomes true, when
value is a string.
It now iterates over each character in every string in the list comprehension, and calls itself (the recursion part).
But… each of those strings (characters) also has the
__iter__ attribute, quickly reaching the max recursion depth set by your Python interpreter.
In this function it was easy to fix of course:
- Either the order of the two
elifs can be swapped
- or we exclude strings from the iter-check (
elif hasattr(value, '__iter__') and not isinstance(value, str))
The more labour-intensive way of fixing it would be rewriting it entirely, since the only thing it actually really does is recursively URL encoding (but for four characters only). Maybe there’s a (bad) reason it only URL encodes these four characters, so that was a can of worms I didn’t want to open.
Anyway, main lesson for me was: even though Python 2 is gone, you might still need to remember its quirks.