Plotly fixes for Starboard Notebook using eval
15 Apr 2023Plotting data is an essential part of data analysis, and there are many libraries available for this task. Plotly is a popular library for creating interactive plots in Python. However, it turned out not to be trivial to use this in Starboard Notebook. In this blog post, I’ll describe how I fixed this.
Plotly is not shipped with Pyodide’s default distribution.
It needs to be installed with micropip
.
The code below tries to create a small scatterplot.
(Note that all code in this post is intended to be run using Pyodide, not in the typical Python interpreter)
# to populate the entry in Pyodide's sys.modules and keep plotly happy
import pandas
import micropip
await micropip.install('plotly')
import plotly.express as px
x = [1, 2, 3]
y = [1, 2, 3]
fig = px.scatter(x=x, y=y)
fig # results in: AttributeError: module 'webbrowser' has no attribute 'get'
This is not enough to render the plot, unfortunately. Plotly tries to open a webbrowser. Understandably, it doesn’t realize Python already is running in a webbrowser.
A convention within notebooks is that HTML content is available from an object’s _repr_html_
method.
This is the HTML equivalent to Python’s standard __repr__
method, which contains a (hopefully nice) string representation of an object.
Although plotly does implement this convention since this pull request,
it seems to try to use a renderer first, which in turn tries to open a webbrowser.
To fix this in your own notebook, there’s two options.
- Patch the
_repr_html_
method on the figure in a hacky way:from functools import partial fig._repr_html_ = partial(fig.to_html, include_plotlyjs=True, full_html=False) fig
- Create an HTML element and fill it with the output of the
to_html
method:from js import document html = fig.to_html( include_plotlyjs=True, # include the Javascript library code full_html=False, # don't build a full HTML document ) div = document.createElement('div') div.innerHTML = html div
Either of these two fixes will eliminate the error, but there’s one problem left: Dumping a script tag on the DOM will not get it evaluated automatically. This stackoverflow post, the MDN documentation and the standards itself confirm this.
This is a bit silly, because Javascript added in different places (like onerror
) might still execute, as mentioned in the MDN documentation.
div = document.createElement('div')
div.innerHTML = "<img src='picturethat404s.gif' onerror='alert(1)'>"
div
So as a security measure it’s far from bulletproof, but it does take away the functionality of adding script tags to the DOM this way.
Using innerHTML
with user input is still (very much) not recommended, but using it with safe input that contains script tags will not achieve the desired result.
There’s still a lot of risk, but some of the reward is no longer there.
Of course, we could find all the newly created script tags and eval
it from a Javascript notebook cell.
To do that, we select all script
tags that are enclosed in a cell output div
(recognized by class name cell-bottom
).
Let’s add this Javascript cell:
document.querySelectorAll('div.cell-bottom * script[type|="text/javascript"]').forEach(
function(e) { eval(e.textContent); }
)
This will get the plots rendered! As a solution I’m still not happy with it though. In most cases, this code is not part of the story you want to tell in a notebook. A notebook should not contain these kinds of distracting hacks to get the plots to render.
So, after talking about it on the Starboard Notebook Discord and on Github, we agreed on this solution: gzuidhof/starboard-notebook#138 Immediately after adding HTML output to the DOM, a script will loop over all script tags that are Javascript and evaluate them. Now it will no longer be necessary to add the Javascript cell.
Special cases aren’t special enough to break the rules. - Zen of Python
What’s nice about this fix is that we don’t have to implement special code for every possible plotting library under the sun.
This is actually something that is getting out of hand I think.
Plotting libraries have large collections of special renderers for different notebooks (Kaggle Notebooks, Azure Notebooks etc.).
Vice versa, notebook software has all kinds of special extensions to support the many plotting libraries.
This fix is a small step to prevent more of that:
Anything that has a _repr_html_
with some scripts in them will now be evaluated.
Fair warning: Both eval
and .innerHTML
should not be used with untrusted (user) input.
The reason I think it can be used here, is because a user will always be the one providing their own code.
It gets a bit scarier when notebooks come from untrusted places.
It will also be a bigger risk when other security measures, like CORS, are not configured properly on the server.
If you’re interested in the open source contributions that will follow this post, here’s some links:
- gzuidhof/starboard-notebook#138 Evaluate all script (grand)children of HTML output to render output of Bokeh and Plotly
- plotly/plotly.py#4162 If
_repr_html_
is invoked, plotly in some cases tries to open a webbrowser - plotly/plotly.py#4161 Be more forgiving when anything with opening a webbrowser fails