(Subjectively) better Streamlit filtering

A common use case when creating a dashboard in Streamlit for me is filtering displayed data. Some of the examples I read on the Streamlit blog were quite “code heavy”:

Although the behaviour of these existing examples was nice enough, they all needed custom classes or methods resulting in many lines of code. The code wasn’t Pythonic enough yet for my taste.

After some trial and error, I came up with a solution. It only needs one additional method, that can be an anonymous lambda function, if it’s simple enough. This is combined with pandas’ apply and defaults to True. This helps keep the filters intuitive to use while avoiding writing lots of logic in classes and methods.

Here’s the solution, as proposed to the official Streamlit documentation in streamlit/docs#709:

Live filtering of a dataset can be achieved by combining st.dataframe and input elements like the select_slider, text_input or multiselect. In the example below, a sample DataFrame will be filtered using these three different elements. We can write custom filtering logic using the apply method provided by Pandas. The custom logic is defined using anonymous lambda functions, which default to True if a filter is not used. This ensures that it’s not mandatory to provide values for each filter.

import pandas
import streamlit as st

# Some sample data:
employees = pandas.DataFrame([
    {"Name": "Ava Reynolds", "Age": 38, "Skills": ["Python", "Javascript"]},
    {"Name": "Caleb Roberts", "Age": 29, "Skills": ["juggling", "karate", "Python"]},
    {"Name": "Harper Anderson", "Age": 51, "Skills": ["sailing", "French", "Javascript"]}
])

# Create an input element and apply the filter to the DataFrame with employees
age_input = st.sidebar.select_slider("Minimum age", options=range(0, 100))
age_filter = employees["Age"] > age_input

# Filter the name field, but default to True if the filter is not used
name_input = st.sidebar.text_input("Name")
name_filter = employees["Name"].apply(lambda name: name_input in name if name_input else True)

# Filter the skills, but default to True if no skills are selected
# Options contains all unique values in the multilabel column Skills
skills_input = st.sidebar.multiselect("Skills", options=employees["Skills"].explode().unique())
skills_filter = employees["Skills"].apply(
    # We check whether any of the selected skills are in the row, defaulting to True if the input is not specified
    # To check whether all of the selected skills are there, simply replace `any` with `all`
    lambda skills: any(skill in skills for skill in skills_input) if skills_input else True
)

# Apply the three different filters and display the data
# Since the default when the filter is not used is True, we can simply use the & operator
employees_filtered = employees[age_filter & name_filter & skills_filter]
st.dataframe(employees_filtered, hide_index=True)