Running a PyPi registry for Windows using Github Actions and Github Pages
07 Apr 2020“This page is not a pip package index.”
https://www.lfd.uci.edu/~gohlke/pythonlibs/
python -m pip install --extra-index-url https://pypi.bartbroe.re <yourpackagehere>
When trying to build Python projects for Windows, I often end up on Christoph Gohlke’s collection of Python wheels for Windows. Most of the time, I can download the package I’m looking for, compiled for Windows, and continue my day. But wouldn’t it be nice if these packages were exposed in a proper Python package index?
The standard for a simple Python package index is really easy. You need one HTML-page with all the packages, containing HTML a-tags for each package, linking to subpages per package. Each of these subpages again should contain a-tags for each provided wheel.
To turn the web page into a package index, you would only need to scrape it, find the packages, find the wheels, and build the new set of html pages.
But… there was obfuscation of the download URLs performed with JavaScript.
function dl1(ml, mi) {
var ot = "https://download.lfd.uci.edu/pythonlibs/";
for (var j = 0; j < mi.length; j++) ot += String.fromCharCode(ml[mi.charCodeAt(j) - 47]);
location.href = ot;
}
function dl(ml, mi) {
mi = mi.replace('<', '<');
mi = mi.replace('>', '>');
mi = mi.replace('&', '&');
setTimeout(function (l) {
dl1(ml, mi)
}, 1500, 1);
}
dl([101,53,106,110,46,105,118,50,115,104,97,100,99,49,116,54,108,51,119,95,112,52,109,113,45,47],
"761FC50=H9:@G6363<G;C@>G;C@>EGA42B9E:>D3A8?");
// this triggers a download: https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/ad3-2.2.1-cp36-cp36m-win_amd64.whl
This code, reconstructed in our Python scraper, looks like this:
ml = [101, 53, 106, 110, 46, 105, 118, 50, 115, 104, 97, 100, 99,
49, 116, 54, 108, 51, 119, 95, 112, 52, 109, 113, 45, 47]
mi = "761FC50=H9:@G6363<G;C@>G;C@>EGA42B9E:>D3A8?"
def deobfuscate_download_url(ml, mi):
mi = mi.replace('<', '<')
mi = mi.replace('>', '>')
mi = mi.replace('&', '&')
output = ''
for i in range(len(mi)):
output += chr(ml[ord(mi[i]) - 47])
return output
print("https://download.lfd.uci.edu/pythonlibs/" + deobfuscate_download_url(ml, mi))
# https://download.lfd.uci.edu/pythonlibs/s2jqpv5t/ad3-2.2.1-cp36-cp36m-win_amd64.whl
And… the server seemed to be checking the User Agent in the request, so we tell
it we are Mozilla/5.0
and not something like
python-requests/{package version} {runtime}/{runtime version} {uname}/{uname -r}
.
Now we have a scraper that can find all packages and wheels in this page, and we build our own package index from this.
Using Github Actions, I planned a periodic run of the scraper, committing back to its own repository. This has the advantage that we can host the package index with Github Pages, which makes this entire thing a free operation.
This is the Github Action that periodically runs:
name: Update PyPi registry
on:
schedule:
- cron: '25 */4 * * *' # daily cron
jobs:
build:
name: Update registry
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: Set up Python 3.7
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install requests
- name: Remove old package index
run: |
mv docs/CNAME ./CNAME
rm -rf docs/*
mv ./CNAME docs/CNAME
- name: Scrape Christoph Gohlke
run: |
python scrape.py
- name: Commit files
run: |
git config --local user.email "[email protected]"
git config --local user.name "PyPi updater"
git add *
git commit -m "Update PyPi registry" -a
- name: Push changes
uses: ad-m/github-push-action@master
with:
github_token: $
I’m hosting this on pypi.bartbroe.re
, untill it eventually breaks, so it’s usable with:
python -m pip install --extra-index-url https://pypi.bartbroe.re <yourpackagehere>