Making Pyodide more powerful using a CORS proxy

Browsers are strict about Cross Origin Resource Sharing (CORS), to protect users from leaking credentials to different untrusted domains. This can be a hurdle if you try to use Pyodide (Python in the browser).

Previously, I shimmed Python requests to be usable from Pyodide, but I will not use this shim here. It has now been deprecated by a better way to patch this into the requests and aiohttp libraries. To illustrate the CORS problem in isolation, I’m simply going to use Javascript’s fetch in this post.

First, I made a few modifications to the httpbin project, so we can simulate responses with various CORS headers. If we try to get a file from a host that has a CORS policy with Access-Control-Allow-Origin set to http://example.com

fetch('https://httpbin.example.com/response-headers?Access-Control-Allow-Origin=http://example.com')

we might get the following error:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://httpbin.example.com/response-headers?Access-Control-Allow-Origin=http://example.com. (Reason: CORS header ‘Access-Control-Allow-Origin’ does not match ‘http://example.com’).

And/Or this one if a Content Security Policy is in place:

Content Security Policy: The page’s settings observed the loading of a resource at https://httpbin.example.com/response-headers (“default-src”). A CSP report is being sent.

Or the following error if the header is missing completely:

Access to fetch at 'https://httpbin.example.com/no-cors-headers' from origin 'https://notebook.example.com' has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. If an opaque response serves your needs, set the request’s mode to ‘no-cors’ to fetch the resource with CORS disabled.

So there’s a lot of stars that have to align for our request to work. This request would be trivial if we made it with an HTTP client that is not a browser. CORS policies are only enforced by browsers, as mentioned in MDN’s documentation on CORS:

The Cross-Origin Resource Sharing standard works by adding new HTTP headers that let servers describe which origins are permitted to read that information from a web browser.

So it’s the server that determines which hosts can see which contents. Having the server change their headers is not feasible in most cases, so let’s introduce CORS proxies. A CORS proxy is a server in the middle that simply strips or replaces some of the headers that are causing trouble.

A minimal example to set this up, using the cors-anywhere project:

/* corsserver.js */
var cors_proxy = require('cors-anywhere');
cors_proxy.createServer({
    originWhitelist: [],
    requireHeader: [],
    removeHeaders: []
}).listen(8080, '0.0.0.0', function() {
    console.log('Running CORS Anywhere');
});

Start it with:

npm install cors-anywhere  # install the dependency to your project
node corsserver.js  # run the server

Typical HTTP proxies work with the CONNECT method. This one doesn’t, it works by suffixing the URL you want to fetch behind the URL of the CORS proxy. There’s a good reason this suffixing is used. Browsers don’t expose the HTTP CONNECT method typically used for proxies.

Using our proxy our modified request will look something like this:

fetch('https://corsproxy.example.com/https://httpbin.example.com/response-headers?Access-Control-Allow-Origin=http://example.com')

Note that our response now is available, and no errors appear in the browser’s Console. The Access-Control-* headers have been stripped. With some work this can be added to the requests shim as well.

If you would actually put this into production, there’s some things you might want to do:

  • If your server requires authentication, make sure the CORS proxy also requires this. “Open proxies” on the internet will attract a lot of malicious traffic.
  • Put the notebook and the proxy behind the same reverse proxy, so they share a domain. This makes setting up the authentication a lot easier as well.
  • If you set up authentication, make sure the CORS proxy checks for authentication, but also make sure it doesn’t forward the Cookie header (see removeHeaders parameter).
  • Set up monitoring on the proxy.
  • Consider whether having whitelists on the CORS proxy is compatible with your use case. This removes some risks.

TLDR: If you are deploying an application that uses Pyodide (a project like Starboard, Quadratic or Jupyter Lite), it might make sense to also deploy a CORS proxy.

Plotly fixes for Starboard Notebook using eval

Plotting data is an essential part of data analysis, and there are many libraries available for this task. Plotly is a popular library for creating interactive plots in Python. However, it turned out not to be trivial to use this in Starboard Notebook. In this blog post, I’ll describe how I fixed this.

Plotly is not shipped with Pyodide’s default distribution. It needs to be installed with micropip. The code below tries to create a small scatterplot.

(Note that all code in this post is intended to be run using Pyodide, not in the typical Python interpreter)

# to populate the entry in Pyodide's sys.modules and keep plotly happy
import pandas

import micropip
await micropip.install('plotly')

import plotly.express as px

x = [1, 2, 3]
y = [1, 2, 3]
fig = px.scatter(x=x, y=y)
fig  # results in: AttributeError: module 'webbrowser' has no attribute 'get'

This is not enough to render the plot, unfortunately. Plotly tries to open a webbrowser. Understandably, it doesn’t realize Python already is running in a webbrowser.

A convention within notebooks is that HTML content is available from an object’s _repr_html_ method. This is the HTML equivalent to Python’s standard __repr__ method, which contains a (hopefully nice) string representation of an object. Although plotly does implement this convention since this pull request, it seems to try to use a renderer first, which in turn tries to open a webbrowser.

To fix this in your own notebook, there’s two options.

  1. Patch the _repr_html_ method on the figure in a hacky way:
    from functools import partial
    
    fig._repr_html_ = partial(fig.to_html, include_plotlyjs=True, full_html=False)
    fig
    
  2. Create an HTML element and fill it with the output of the to_html method:
     from js import document
    
     html = fig.to_html(
         include_plotlyjs=True,  # include the Javascript library code
         full_html=False,  # don't build a full HTML document
     )
    
     div = document.createElement('div')
     div.innerHTML = html
     div
    

Either of these two fixes will eliminate the error, but there’s one problem left: Dumping a script tag on the DOM will not get it evaluated automatically. This stackoverflow post, the MDN documentation and the standards itself confirm this.

This is a bit silly, because Javascript added in different places (like onerror) might still execute, as mentioned in the MDN documentation.

div = document.createElement('div')
div.innerHTML = "<img src='picturethat404s.gif' onerror='alert(1)'>"
div

So as a security measure it’s far from bulletproof, but it does take away the functionality of adding script tags to the DOM this way. Using innerHTML with user input is still (very much) not recommended, but using it with safe input that contains script tags will not achieve the desired result. There’s still a lot of risk, but some of the reward is no longer there.

Of course, we could find all the newly created script tags and eval it from a Javascript notebook cell. To do that, we select all script tags that are enclosed in a cell output div (recognized by class name cell-bottom). Let’s add this Javascript cell:

document.querySelectorAll('div.cell-bottom * script[type|="text/javascript"]').forEach(
    function(e) { eval(e.textContent); }
)

This will get the plots rendered! As a solution I’m still not happy with it though. In most cases, this code is not part of the story you want to tell in a notebook. A notebook should not contain these kinds of distracting hacks to get the plots to render.

So, after talking about it on the Starboard Notebook Discord and on Github, we agreed on this solution: gzuidhof/starboard-notebook#138 Immediately after adding HTML output to the DOM, a script will loop over all script tags that are Javascript and evaluate them. Now it will no longer be necessary to add the Javascript cell.

Special cases aren’t special enough to break the rules. - Zen of Python

What’s nice about this fix is that we don’t have to implement special code for every possible plotting library under the sun. This is actually something that is getting out of hand I think. Plotting libraries have large collections of special renderers for different notebooks (Kaggle Notebooks, Azure Notebooks etc.). Vice versa, notebook software has all kinds of special extensions to support the many plotting libraries. This fix is a small step to prevent more of that: Anything that has a _repr_html_ with some scripts in them will now be evaluated.

Fair warning: Both eval and .innerHTML should not be used with untrusted (user) input. The reason I think it can be used here, is because a user will always be the one providing their own code. It gets a bit scarier when notebooks come from untrusted places. It will also be a bigger risk when other security measures, like CORS, are not configured properly on the server.

If you’re interested in the open source contributions that will follow this post, here’s some links:

Towards a universal DRU fireplace remote using the Flipper Zero

In the manual of my gas fireplace’s remote I came across this bit of text:

Voordat het toestel in gebruik wordt genomen, moet een communicatiecode ingesteld worden tussen de afstandsbediening en de ontvanger. De code wordt willekeurig gekozen uit de 65000 codes die beschikbaar zijn. Hierdoor is de kans klein dat andere afstandsbedieningen in uw omgeving dezelfde code gebruiken en de werking van uw toestel beïnvloeden.

Translated into English, it says something like:

Before using the device, a communication code needs to be set between the remote and the receiver. This code is chosen randomly from the 65000 codes that are available. Because of this, the chances are slim that a different remote in your environment uses the same code, which would interfere with the working of your device.

The number 65000 is suspiciously close to 2^16 (65536). This means that the Mertik GV60 (the remote type) might send a “unique-enough” 2-byte identifier over the air, along with the command for the heater.

Since this remote transmits at 433.92 MHz, it can be interesting to see what the Flipper Zero makes of this signal. To do this, I used the “Read Raw” functionality in the Sub-GHz app on the Flipper.

Flipper detecting the frequency of the DRU fireplace remote Flipper after reading the signal of the DRU fireplace remote

Dumping files for two different remotes, and for four different operations (higher, lower, ignite, turn off), we end up with eight files:

  • Remote0_higher.sub
  • Remote0_lower.sub
  • Remote0_ignite.sub
  • Remote0_off.sub
  • Remote1_higher.sub
  • Remote1_lower.sub
  • Remote1_ignite.sub
  • Remote1_off.sub

Since only one of these remotes works with my fireplace, it’s safe to assume they have different identifiers. This will be nice later, if we are going to compare the signals.

Reading a bit more in the manual, it also seemed unlikely to me that there was an actual bi-directional handshake when connecting a remote to the fireplace. To pair it, you need to put the receiver in pairing mode, and press the flame higher or lower button within 20 seconds. This makes me suspect that the 2-byte identifier is hardcoded in the remote, since the remote itself does not have to be put in some kind of pairing mode.

Now we need to make sense of the Flipper Zero’s .sub-files. The documentation mentions that a raw .sub file contains timings, but does not have a lot of information beyond that:

RAW_Data, contains an array of timings, specified in micro seconds. Values must be non-zero, start with a positive number, and interleaved (change sign with each value).

Of course I am not the first person to look at those files, so I found the fzsubtk script on Github. In absence of a software license, I just read this as inspiration to make my own visualisation.

While parsing the .sub-file, I discovered something that probably shouldn’t happen when dumping these files. I had a Raw_Data line that started with a negative value, which should not be possible. Of course I have submitted this as a Github issue: flipperzero-firmware#2260. I quickly received a reply, and it should be fixed for newer versions of the Flipper Zero firmware.

import numpy


def read_sub_ghz_file(filename):
    """
    Read a .sub file as produced by Flipper Zero, and prepare it for plotting.

    This method contains some fixes that might truncate some of the data.
    These should be fixed with a newer release of the Flipper Zero firmware.
    """
    with open(filename, 'r') as f:
        values, durations = [], []
        for line in f.readlines():
            if line.startswith("RAW_Data:"):
                data = [int(x) for x in line[10:].split(' ')]
                # The two fixes below are for Github issue flipperzero-firmware#2260
                if data[0] > 0 and data[1] > 0:
                    data = data[2:]
                if data[0] < 0:
                    data = data[1:]
                for i, point in enumerate(data):
                    if i % 2 == 0:
                        values.append(point)
                    else:
                        durations.append(point)
    durations, values = numpy.abs(numpy.array(durations)), numpy.cumsum(numpy.abs(numpy.array(values)))
    max_len = min([len(durations), len(values)])
    return values[:max_len], durations[:max_len]
from matplotlib import pyplot

remote1_lower = read_sub_ghz_file('Remote1_lower.sub')
remote0_lower = read_sub_ghz_file('Remote0_lower.sub')

# all the numbers below don't mean anything, and are just to align the plot a bit
pyplot.figure(figsize=(16, 8))
pyplot.ylim(-500, 2000)
pyplot.xlim(-2500, 15000)
pyplot.step(remote0_lower[0] - 941300,
            remote0_lower[1], where='pre')
pyplot.step(remote1_lower[0] - 761825,
            remote1_lower[1] - 400, where='pre')
pyplot.show()

Now that we have plotted the signals produced by two different remotes nicely, it is time to start speculating on the encoding. My best guess currently is that we’re looking for a 3-byte sequence: two bytes to identify the remote, and one byte that specifies the command to execute. These are the raw bits I think I can read from the plot:

signal_blue =   '1100001001000000110011000'
signal_orange = '1000000100001000010011111'

len(signal_blue) // 8

There are many different ways to encode a digital signal over analog radio. This video by Jacob Schrum explains some common ones quite well, and has helpful examples.

I might return to this project later, in an attempt to find the encoding. I’ll be familiarizing myself with some signal processing tools, or perhaps try to bruteforce all possible encodings with some custom scripting.

Replaying the signal is nice, but the end goal of course is to create a Flipper application that can ignite any DRU fireplace. Sources used:

  1. https://github.com/cberetta/flipperzero_scripts/blob/master/docs/fzsubtk_example.md
  2. https://www.bouwmansvuurtotaal.nl/wp-content/uploads/2016/04/Bouwmans_Vuur_Totaal_handleiding_compleet_Dru-Metrik-GV60.pdf
  3. https://www.kachelsenhaardenwinkel.nl/winkel/onderdelen/merk/dru/afstandsbediening-mertik-g6r-h4t28-xd/
  4. https://docs.flipperzero.one/sub-ghz/frequencies
  5. https://www.youtube.com/watch?v=i_TLLACZuRk&ab_channel=JacobSchrum
  6. https://github.com/flipperdevices/flipperzero-firmware/blob/27ee0f73f7b81e6791c07f036939015dec9f6a85/documentation/file_formats/SubGhzFileFormats.md
  7. https://github.com/flipperdevices/flipperzero-firmware/issues/2260

Exploring how to create binary wheels for Pythonista

This is a follow-up on the previous post on how to get Pip working with Pythonista. We ended with a working pip but didn’t have a way of installing binary packages yet (like scipy and scikit-learn).

Using the Oracle Cloud, which offers (free!) aarch64 intances, I tried to build some Python wheels for my iPhone.

sudo apt install zlib1g-dev make libssl-dev curl
git clone https://github.com/deadsnakes/python3.6
cd python3.6
./configure
make
curl -L https://bootstrap.pypa.io/pip/3.6/get-pip.py > ./get-pip.py
./python get-pip.py
./python -m pip wheel scikit-learn

After uploading it to my PyPI repository we can try to install it using pip. Pip installing a custom wheel built on the Oracle Cloud

Except unfortunately, this wheel is still not in the expected format. Not a supported wheel on this platform

This is related to how the wheel file format is specified in PEP 0427. The short summary is that the platform tag can be seen in the filename: {distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl.

To find out what platform tags Pythonista asks for on iPhones, I added a debug line in pip/_internal/index/package_finder.py, and ran pip again. Debug line in package_finder.py

This resulted in the following list of platform tags: Cleaner output of logging

The Python wheels will need to comply to this expected platform tag. Instead of cp36-cp36m-manylinux2014_aarch64 it looks like it needs to say cp36-cp36-darwin_21_5_0_iphone12,8.

Update: Right after finishing this post I noticed that the platform tag changed to cp36-cp36-macosx_15_0_iphone12,8 (instead of the darwin tag above). This might have been caused by an iOS update.

After running auditwheel fix on the created wheel all relevant system libraries are copied into the wheel.

Now I run the following script to modify the wheel to use the expected platform tag:

import zipfile


with zipfile.ZipFile('scipy-1.5.4-cp36-cp36m-linux_aarch64.whl', 'r') as input_wheel:
    with zipfile.ZipFile('scipy-1.5.4-cp36-cp36-macosx_15_0_iphone12,8.whl', 'w',
                         compression=zipfile.ZIP_DEFLATED) as output_wheel:
        for input_zipinfo in input_wheel.infolist():
            if input_zipinfo.filename.endswith('.dist-info/WHEEL'):
                output_wheel.writestr(
                    input_zipinfo.filename,
                    input_wheel.read(input_zipinfo.filename).replace(
                        b'cp36-cp36m-linux_aarch64',
                        b'cp36-cp36-macosx_15_0_iphone12,8')
                )
            elif input_zipinfo.filename.endswith('.dist-info/RECORD'):
                output_wheel.writestr(
                    input_zipinfo.filename,
                    input_wheel.read(input_zipinfo.filename).replace(
                        b'.cpython-36m-aarch64-linux-gnu',
                        b'')
                )
            else:
                output_wheel.writestr(
                    input_zipinfo.filename.replace('.cpython-36m-aarch64-linux-gnu', ''),
                    input_wheel.read(input_zipinfo.filename)
                )

Now it is recognized by pip as suitable for the platform and installs without issue.

Pip installing scipy wheel

When trying to use the newly installed scipy however, it still can’t find the correct shared objects.

import scipy No binary module

If we try to directly import this shared object using ctypes, we can see better why it will not work:

IMG-1901 IMG-1902

DLLs need to be Mach-O, instead of the a.out format.

But how does Pythonista include the non-standard-libraries it ships with? To find out, I made a copy of the app itself. This was quite easy to do, since Pythonista ships with Python:

Dumping the app

Using this dump, I could determine that the extra packages like numpy and matplotlib all live in Frameworks/Py3Kit.framework/pylib/site-packages. However, in this directory, all shared objects that normally also live there, are missing.

If we decompile the app’s Py3Kit.framework executable, we can see that it actually contains these binary Python modules that were missing in site-packages. They are all added to the built-in Python packages, using the _PyImport_AppendInittab method available in Python’s C API.

void PYK3Interpreter::registerBuiltinModules(ID param_1,SEL param_2)

{
  _PyImport_AppendInittab("speech",_PyInit_speech);
  _PyImport_AppendInittab("reminders",_PyInit_reminders);
  _PyImport_AppendInittab("contacts",_PyInit_contacts);
  _PyImport_AppendInittab("sound",_PyInit_sound);
  _PyImport_AppendInittab("linguistictagger",_PyInit_linguistictagger);
  _PyImport_AppendInittab("_ui",_PyInit__ui);
  _PyImport_AppendInittab("_notification",_PyInit__notification);
  _PyImport_AppendInittab("_pythonista",_PyInit__pythonista);
  _PyImport_AppendInittab("_keyboard",_PyInit__keyboard);
  _PyImport_AppendInittab("_dialogs",_PyInit__dialogs);
  _PyImport_AppendInittab("_appex",_PyInit__appex);
  _PyImport_AppendInittab("_font_cache",_PyInit__font_cache);
  _PyImport_AppendInittab("_scene2",_PyInit__scene2);
  _PyImport_AppendInittab("console",_PyInit_console);
  _PyImport_AppendInittab("_clipboard",_PyInit__clipboard);
  _PyImport_AppendInittab("_photos",_PyInit__photos);
  _PyImport_AppendInittab("_photos2",_PyInit__photos2);
  _PyImport_AppendInittab("_webbrowser",_PyInit__webbrowser);
  _PyImport_AppendInittab("_twitter",_PyInit__twitter);
  _PyImport_AppendInittab("location",_PyInit_location);
  _PyImport_AppendInittab("_motion",_PyInit__motion);
  _PyImport_AppendInittab("keychain",_PyInit_keychain);
  _PyImport_AppendInittab("_cb",_PyInit__cb);
  _PyImport_AppendInittab("_canvas",_PyInit__canvas);
  _PyImport_AppendInittab("_imaging",_PyInit__imaging);
  _PyImport_AppendInittab("_imagingft",_PyInit__imagingft);
  _PyImport_AppendInittab("_imagingmath",_PyInit__imagingmath);
  _PyImport_AppendInittab("_imagingmorph",_PyInit__imagingmorph);
  _PyImport_AppendInittab("_np_multiarray",_PyInit_multiarray);
  _PyImport_AppendInittab("_np_scalarmath",_PyInit_scalarmath);
  _PyImport_AppendInittab("_np_umath",_PyInit_umath);
  _PyImport_AppendInittab("_np_fftpack_lite",_PyInit_fftpack_lite);
  _PyImport_AppendInittab("_np__compiled_base",_PyInit__compiled_base);
  _PyImport_AppendInittab("_np__umath_linalg",_PyInit__umath_linalg);
  _PyImport_AppendInittab("_np_lapack_lite",_PyInit_lapack_lite);
  _PyImport_AppendInittab("_np_mtrand",&_PyInit_mtrand);
  _PyImport_AppendInittab("_np__capi",_PyInit__capi);
  _PyImport_AppendInittab("_mpl__backend_agg",_PyInit__backend_agg);
  _PyImport_AppendInittab("_mpl__image",_PyInit__image);
  _PyImport_AppendInittab("_mpl__path",_PyInit__path);
  _PyImport_AppendInittab("_mpl_ttconv",_PyInit_ttconv);
  _PyImport_AppendInittab("_mpl__cntr",_PyInit__cntr);
  _PyImport_AppendInittab("_mpl_ft2font",_PyInit_ft2font);
  _PyImport_AppendInittab("_mpl__png",_PyInit__png);
  _PyImport_AppendInittab("_mpl__delaunay",_PyInit__delaunay);
  _PyImport_AppendInittab("_mpl__qhull",_PyInit__qhull);
  _PyImport_AppendInittab("_mpl__tri",_PyInit__tri);
  _PyImport_AppendInittab("_counter",_PyInit__counter);
  _PyImport_AppendInittab("_AES",_PyInit__AES);
  _PyImport_AppendInittab("_ARC2",_PyInit__ARC2);
  _PyImport_AppendInittab("_ARC4",_PyInit__ARC4);
  _PyImport_AppendInittab("_Blowfish",_PyInit__Blowfish);
  _PyImport_AppendInittab("_CAST",_PyInit__CAST);
  _PyImport_AppendInittab("_DES3",_PyInit__DES3);
  _PyImport_AppendInittab("_DES",_PyInit__DES);
  _PyImport_AppendInittab("_MD2",_PyInit__MD2);
  _PyImport_AppendInittab("_MD4",_PyInit__MD4);
  _PyImport_AppendInittab("_RIPEMD160",_PyInit__RIPEMD160);
  _PyImport_AppendInittab("_SHA224",_PyInit__SHA224);
  _PyImport_AppendInittab("_SHA256",_PyInit__SHA256);
  _PyImport_AppendInittab("_SHA512",_PyInit__SHA512);
  _PyImport_AppendInittab("_XOR",_PyInit__XOR);
  _PyImport_AppendInittab("strxor",_PyInit_strxor);
  _PyImport_AppendInittab("pykit_io",_PyInit_pykit_io);
  return;
}

In a next post I’ll be looking into compiling the wheels with Mach-O shared libraries (or bundles as Apple calls them).

Adding pip to Pythonista for iOS

Pythonista is probably the most popular Python app for iOS. This post is a summary of the work I did to get pip working. Here’s how to do it:

Installing pip

import requests
import sys
from io import BytesIO
from zipfile import ZipFile

# Get the location of the Python 3 site-packages
site_packages = next(filter(
  lambda x: 'site-packages-3' in x,
  sys.path
))

# extract directly into site-packages
ZipFile(BytesIO(requests.get(
    'https://files.pythonhosted.org/packages/90/a9/1ea3a69a51dcc679724e3512fc2aa1668999eed59976f749134eb02229c8/pip-21.3-py3-none-any.whl'
).content)).extractall(site_packages)

print("Downloaded pip")

This downloads pip to the site-packages folder for Python 3. Pythonista calls this folder site-packages-3.

Now that we have pip set up, we can start downloading our first package:

Using pip from Pythonista

import pip
import sys

site_packages = next(filter(
  lambda x: 'site-packages-3' in x,
  sys.path
))

print(
  pip.main(f'install ',
           f'--target {site_packages} '
           f'tqdm'.
  split(' '))
)

This works a bit differently from how you would typically use pip. Since we use it as a library, we call the pip.main function with a list of arguments (created by .split(' ')).

The default directory pip tries is not writable. It’s part of the Pythonista app. We therefore manually indicate it should write to our site-packages-3 folder using --target. Note that this probably will not yet work for dependencies with binary extensions (libraries like scipy etc.).

Of course, I also tried to use StaSh. This seemed quite suitable at first, but upon closer inspection, the pip it contained is not the common version. In fact it contains its own pip.py which approximates the canonical pip’s behaviour.

In a next post I’ll explore how to use pip to get binary wheels to install on your iDevice. This will involve building wheels specific for iOS and maybe even setting up a PyPI mirror.