Why pip and Homebrew make a dangerous cocktail

pip install will happily replace anything in /usr/local/

pip, a Python package manager, can install two types of Python packages: A source distribution and a binary distribution.

With the source distribution, it’s simply running the setup.py which often contains a call to setuptools.setup. You could argue that pip installing source distributions is RCE by design.

Binary distributions (often wheels) are not intended to run code immediately during install. They simply copy files using all kinds of logic defined in pip itself. This should make them less dangerous.

Homebrew is “The Missing Package Manager for macOS”, and probably the most popular way for MacOS users to get Python: Python 3.* was brew installed around 500,000 times in the last 30 days. “Homebrew installs packages to their own directory and then symlinks their files into /opt/homebrew (on Apple Silicon).” Note that this is /usr/local/ on Intel Macs. “Homebrew won’t install files outside its prefix and you can place a Homebrew installation wherever you like.”

In this post I’ll assume a Python 3.9 installation, performed with brew install [email protected]. I’ll demonstrate that a malicious Python package can replace files in the Homebrew prefix directory, by default /usr/local for Intel Macs and /opt/homebrew/ for ARM Macs.

By defining the following setup.py we could even replace the python3.9 executable itself. I replace python3.9 with an executable that simply outputs Not Python to demonstrate the issue:

from setuptools import setup

setup(name='malware',
      version='3.2.1',
      description='malware',
      url='https://example.com',
      author='',
      author_email='[email protected]',
      # Every file in /usr/local can be poisoned by including data_files.
      # If they already existed, the executable flag is preserved
      # This is just one example of a file that can be replaced:
      data_files=[("Cellar/[email protected]/3.9.17_1/bin", ["python3.9"])],
      packages=[],
      install_requires=[])

In a video this looks like this:

If a data file has the same path as an existing file, and the existing file has executable bits set, they will remain set!

Furthermore, real world attacks will be much more subtle than the one above. An attacker could patch some malware into a dynamic library while preserving its original functionality. Packages like lief will help you do that with ease.

Of course, before posting this here, I tried to find out how well known this issue is. After some discussion with the people running the security mailing lists at Python and the Python Packaging Authority, the conclusion is: There is no fix for this, at least not without breaking someone’s legitimate uses.

It is possible to think of some countermeasures to reduce the risk, however. One countermeasure could be screening packages for executables and libraries in places they don’t typically belong. This can be done in three steps:

  1. Download the package and its dependencies using pip download, without installing them. The --only-binary=:all: is important because pip download will run a source distribution to find out its dependencies.
mkdir /tmp/scan
python3 -m pip download --only-binary=:all: -d /tmp/scan package_name
  1. Apply the following script to detect any executables or libraries in places they don’t belong (using libmagic):
import glob
import zipfile

import magic  # brew install libmagic && python3.9 -m pip install python-magic
from tqdm import tqdm  # python3.9 -m pip install tqdm

for wheel in tqdm(list(glob.glob("/tmp/scan/*.whl"))):
    wheel = zipfile.ZipFile(wheel)
    for file in wheel.filelist:
        # this is an indication that data was added using data_files:
        if '.data/data/' in file.filename:
            # let libmagic find out what it is:
            magic_guess = magic.from_buffer(wheel.open(file.filename).read())
            # change this when not on MacOS to something relevant to your platform:
            if 'Mach-O' in magic_guess:  
                print(wheel.filename, file.filename, magic_guess)

Note that in this example I only check for Mach-O libraries and executables. This reduces false positives. For example: If a package author sets include_package_data=True there will be lots of Python files in the data directory as well. This scanning also does not check for shell scripts or other types of executables. To cast a wider net, check for the words 'executable' or 'library' in the output of libmagic.

  1. Inspect the output of step 2, and decide if you still want to run pip install on the same target and its transitive dependencies.

Of course, it’s best to combine this scan-before-you-install with some long-standing best practices:

Firstly, avoid installing pip source distributions when possible, using the --only-binary=:all: parameter.

Secondly, follow the recommendations here, and configure Homebrew to use a directory not on the PATH. This means disregarding Homebrew’s own documentation, warning that it might be inconvenient not to have this set to /usr/local.

To summarize: installing things always introduces an inherent risk, but the way Homebrew’s Python is configured might pose an additional risk.