The edge cases 2to3 might not fix

A year after Python 2 was officially deprecated, 2to3 is still my favourite tool for porting Python 2 code to Python 3. Only recently, when using it on a legacy code base, I found one of the edge cases 2to3 will not fix for you. Consider this function in Python, left completely untouched by running 2to3. It worked fine in Python 2, but throws RecursionError in Python 3. (It is of questionable quality; I didn’t make it originally).

def safe_escape(value):
    if isinstance(value, dict):
        value = OrderedDict([
            (safe_escape(k), safe_escape(v)) for k, v in value.items()
        ])
    elif hasattr(value, '__iter__'):
        value = [safe_escape(v) for v in value]
    elif isinstance(value, str):
        value = value.replace('<', '%3C')
        value = value.replace('>', '%3E')
        value = value.replace('"', '%22')
        value = value.replace("'", '%27')
    return value

But why? It turns out strings in Python 2 don’t have the __iter__ method, but they do in Python 3. What happens in Python 3 is that the hasattr(value, '__iter__') condition becomes true, when value is a string. It now iterates over each character in every string in the list comprehension, and calls itself (the recursion part). But… each of those strings (characters) also has the __iter__ attribute, quickly reaching the max recursion depth set by your Python interpreter.

In this function it was easy to fix of course:

  1. Either the order of the two elifs can be swapped
  2. or we exclude strings from the iter-check (elif hasattr(value, '__iter__') and not isinstance(value, str))

The more labour-intensive way of fixing it would be rewriting it entirely, since the only thing it actually really does is recursively URL encoding (but for four characters only). Maybe there’s a (bad) reason it only URL encodes these four characters, so that was a can of worms I didn’t want to open.

Anyway, main lesson for me was: even though Python 2 is gone, you might still need to remember its quirks.