Running a PyPi registry for Windows using Github Actions and Github Pages

“This page is not a pip package index.” https://www.lfd.uci.edu/~gohlke/pythonlibs/

python -m pip install --extra-index-url https://pypi.bartbroe.re <yourpackagehere>

When trying to build Python projects for Windows, I often end up on Christoph Gohlke’s collection of Python wheels for Windows. Most of the time, I can download the package I’m looking for, compiled for Windows, and continue my day. But wouldn’t it be nice if these packages were exposed in a proper Python package index?

The standard for a simple Python package index is really easy. You need one HTML-page with all the packages, containing HTML a-tags for each package, linking to subpages per package. Each of these subpages again should contain a-tags for each provided wheel.

To turn the web page into a package index, you would only need to scrape it, find the packages, find the wheels, and build the new set of html pages.

But… there was obfuscation of the download URLs performed with JavaScript.

function dl1(ml, mi) {
    var ot = "https://download.lfd.uci.edu/pythonlibs/";
    for (var j = 0; j < mi.length; j++) ot += String.fromCharCode(ml[mi.charCodeAt(j) - 47]);
    location.href = ot;
}

function dl(ml, mi) {
    mi = mi.replace('&lt;', '<');
    mi = mi.replace('&#62;', '>');
    mi = mi.replace('&#38;', '&');
    setTimeout(function (l) {
        dl1(ml, mi)
    }, 1500, 1);
}

dl([101,53,106,110,46,105,118,50,115,104,97,100,99,49,116,54,108,51,119,95,112,52,109,113,45,47], 
   "761FC50=H9:@G6363&lt;G;C@&#62;G;C@&#62;EGA42B9E:&#62;D3A8?");
// this triggers a download: https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/ad3-2.2.1-cp36-cp36m-win_amd64.whl

This code, reconstructed in our Python scraper, looks like this:

ml = [101, 53, 106, 110, 46, 105, 118, 50, 115, 104, 97, 100, 99, 
      49, 116, 54, 108, 51, 119, 95, 112, 52, 109, 113, 45, 47]
mi = "761FC50=H9:@G6363&lt;G;C@&#62;G;C@&#62;EGA42B9E:&#62;D3A8?"


def deobfuscate_download_url(ml, mi):    
    mi = mi.replace('&lt;', '<')
    mi = mi.replace('&#62;', '>')
    mi = mi.replace('&#38;', '&')
    output = ''
    for i in range(len(mi)):
        output += chr(ml[ord(mi[i]) - 47])
    return output

print("https://download.lfd.uci.edu/pythonlibs/" + deobfuscate_download_url(ml, mi))
# https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/ad3-2.2.1-cp36-cp36m-win_amd64.whl

And… the server seemed to be checking the User Agent in the request, so we tell it we are Mozilla/5.0 and not something like python-requests/{package version} {runtime}/{runtime version} {uname}/{uname -r}.

Now we have a scraper that can find all packages and wheels in this page, and we build our own package index from this.

Using Github Actions, I planned a periodic run of the scraper, committing back to its own repository. This has the advantage that we can host the package index with Github Pages, which makes this entire thing a free operation.

This is the Github Action that periodically runs:

name: Update PyPi registry
on: 
  schedule:
    - cron:  '25 */4 * * *' # daily cron
jobs:
  build:
    name: Update registry
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@master
    - name: Set up Python 3.7
      uses: actions/setup-python@v1
      with:
        python-version: 3.7
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install requests
    - name: Remove old package index
      run: |
        mv docs/CNAME ./CNAME
        rm -rf docs/*
        mv ./CNAME docs/CNAME
    - name: Scrape Christoph Gohlke
      run: |
        python scrape.py
    - name: Commit files
      run: |
        git config --local user.email "[email protected]"
        git config --local user.name "PyPi updater"
        git add *
        git commit -m "Update PyPi registry" -a
    - name: Push changes
      uses: ad-m/github-push-action@master
      with:
        github_token: $

I’m hosting this on pypi.bartbroe.re, untill it eventually breaks, so it’s usable with:

python -m pip install --extra-index-url https://pypi.bartbroe.re <yourpackagehere>