The edge cases 2to3 might not fix

21 Dec 2020

A year after Python 2 was officially deprecated, 2to3 is still my favourite tool for porting Python 2 code to Python 3. Only recently, when using it on a legacy code base, I found one of the edge cases 2to3 will not fix for you. Consider this function in Python, left completely untouched by running 2to3. It worked fine in Python 2, but throws RecursionError in Python 3. (It is of questionable quality; I didn’t make it originally).

def safe_escape(value):
    if isinstance(value, dict):
        value = OrderedDict([
            (safe_escape(k), safe_escape(v)) for k, v in value.items()
        ])
    elif hasattr(value, '__iter__'):
        value = [safe_escape(v) for v in value]
    elif isinstance(value, str):
        value = value.replace('<', '%3C')
        value = value.replace('>', '%3E')
        value = value.replace('"', '%22')
        value = value.replace("'", '%27')
    return value

But why? It turns out strings in Python 2 don’t have the __iter__ method, but they do in Python 3. What happens in Python 3 is that the hasattr(value, '__iter__') condition becomes true, when value is a string. It now iterates over each character in every string in the list comprehension, and calls itself (the recursion part). But… each of those strings (characters) also has the __iter__ attribute, quickly reaching the max recursion depth set by your Python interpreter.

In this function it was easy to fix of course:

Either the order of the two elifs can be swapped
or we exclude strings from the iter-check (elif hasattr(value, '__iter__') and not isinstance(value, str))

The more labour-intensive way of fixing it would be rewriting it entirely, since the only thing it actually really does is recursively URL encoding (but for four characters only). Maybe there’s a (bad) reason it only URL encodes these four characters, so that was a can of worms I didn’t want to open.

Anyway, main lesson for me was: even though Python 2 is gone, you might still need to remember its quirks.

Bart Broere

The edge cases 2to3 might not fix

Related Posts

Inlining WASM in html might not be that terrible 06 Mar 2025

Dynamically limiting the amount of concurrent goroutines with resizableChannel 27 Jan 2025

Stripping metadata from a docx file 14 Nov 2024