Plotly fixes for Starboard Notebook using eval

Plotting data is an essential part of data analysis, and there are many libraries available for this task. Plotly is a popular library for creating interactive plots in Python. However, it turned out not to be trivial to use this in Starboard Notebook. In this blog post, I’ll describe how I fixed this.

Plotly is not shipped with Pyodide’s default distribution. It needs to be installed with micropip. The code below tries to create a small scatterplot.

(Note that all code in this post is intended to be run using Pyodide, not in the typical Python interpreter)

# to populate the entry in Pyodide's sys.modules and keep plotly happy
import pandas

import micropip
await micropip.install('plotly')

import plotly.express as px

x = [1, 2, 3]
y = [1, 2, 3]
fig = px.scatter(x=x, y=y)
fig  # results in: AttributeError: module 'webbrowser' has no attribute 'get'

This is not enough to render the plot, unfortunately. Plotly tries to open a webbrowser. Understandably, it doesn’t realize Python already is running in a webbrowser.

A convention within notebooks is that HTML content is available from an object’s _repr_html_ method. This is the HTML equivalent to Python’s standard __repr__ method, which contains a (hopefully nice) string representation of an object. Although plotly does implement this convention since this pull request, it seems to try to use a renderer first, which in turn tries to open a webbrowser.

To fix this in your own notebook, there’s two options.

  1. Patch the _repr_html_ method on the figure in a hacky way:
    from functools import partial
    
    fig._repr_html_ = partial(fig.to_html, include_plotlyjs=True, full_html=False)
    fig
    
  2. Create an HTML element and fill it with the output of the to_html method:
     from js import document
    
     html = fig.to_html(
         include_plotlyjs=True,  # include the Javascript library code
         full_html=False,  # don't build a full HTML document
     )
    
     div = document.createElement('div')
     div.innerHTML = html
     div
    

Either of these two fixes will eliminate the error, but there’s one problem left: Dumping a script tag on the DOM will not get it evaluated automatically. This stackoverflow post, the MDN documentation and the standards itself confirm this.

This is a bit silly, because Javascript added in different places (like onerror) might still execute, as mentioned in the MDN documentation.

div = document.createElement('div')
div.innerHTML = "<img src='picturethat404s.gif' onerror='alert(1)'>"
div

So as a security measure it’s far from bulletproof, but it does take away the functionality of adding script tags to the DOM this way. Using innerHTML with user input is still (very much) not recommended, but using it with safe input that contains script tags will not achieve the desired result. There’s still a lot of risk, but some of the reward is no longer there.

Of course, we could find all the newly created script tags and eval it from a Javascript notebook cell. To do that, we select all script tags that are enclosed in a cell output div (recognized by class name cell-bottom). Let’s add this Javascript cell:

document.querySelectorAll('div.cell-bottom * script[type|="text/javascript"]').forEach(
    function(e) { eval(e.textContent); }
)

This will get the plots rendered! As a solution I’m still not happy with it though. In most cases, this code is not part of the story you want to tell in a notebook. A notebook should not contain these kinds of distracting hacks to get the plots to render.

So, after talking about it on the Starboard Notebook Discord and on Github, we agreed on this solution: gzuidhof/starboard-notebook#138 Immediately after adding HTML output to the DOM, a script will loop over all script tags that are Javascript and evaluate them. Now it will no longer be necessary to add the Javascript cell.

Special cases aren’t special enough to break the rules. - Zen of Python

What’s nice about this fix is that we don’t have to implement special code for every possible plotting library under the sun. This is actually something that is getting out of hand I think. Plotting libraries have large collections of special renderers for different notebooks (Kaggle Notebooks, Azure Notebooks etc.). Vice versa, notebook software has all kinds of special extensions to support the many plotting libraries. This fix is a small step to prevent more of that: Anything that has a _repr_html_ with some scripts in them will now be evaluated.

Fair warning: Both eval and .innerHTML should not be used with untrusted (user) input. The reason I think it can be used here, is because a user will always be the one providing their own code. It gets a bit scarier when notebooks come from untrusted places. It will also be a bigger risk when other security measures, like CORS, are not configured properly on the server.

If you’re interested in the open source contributions that will follow this post, here’s some links: