Pay for the cheapest Netflix, get the most premium one

TLDR: In this post I show how to take advantage of Netflix delivering your new subscription before your payment starts. You could do this manually, but of course "it’s more fun to compute".

Netflix has an interesting upgrade flow: Once you upgrade, you get the upgraded plan for the remainder of the billing period. You only start paying your new fee at the start of the new period.

However, if you upgrade, and downgrade in the same billing period, you’ll get the upgraded plan for the remainder of the current billing period. At the start of the new billing period, you’ll be downgraded to your original plan again. Of course, there’s nothing stopping you from doing the same thing again. Therefore, if you repeat this every billing period, you can have the best plan for the lowest price.

This raises a question: “This can’t be intentional, can it?”. After I submitted a short bug report, Netflix replied that it is indeed intended behaviour:

We actually received a similar report previously about this one and [decided] that this is actually an intended functionality.

So… here’s the code, using Selenium:

import logging
from collections import namedtuple

from selenium import webdriver

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

Configuration = namedtuple('Configuration', ['username', 'password'])

config = Configuration(username='username_here',
                       password='password_here')
options = webdriver.ChromeOptions()
options.binary_location = "./headless-chromium"
browser = webdriver.Chrome(executable_path='./chromedriver',
                           chrome_options=options)
browser.implicitly_wait(time_to_wait=10)
browser.get('https://www.netflix.com/ChangePlan')
browser.find_element_by_id('id_userLoginId').send_keys(
    config.username
)
browser.find_element_by_id('id_password').send_keys(
    config.password
)
browser.find_element_by_css_selector(
    'button.btn.login-button.btn-submit.btn-small'
).click()
try:
    message = browser.find_element_by_css_selector(
        'div.ui-message-contents'
    ).text
    logging.info('Page contains infobox (probably stating that Netflix '
                 'has already been upgraded this month')
    logging.info(message)
    logging.info('Nothing to do left')
    quit(0)
except TimeoutError:
    # The upgrade has not been done this month yet, because there's no
    # infobox saying so
    current_plan = browser.find_element_by_css_selector(
        'li.selected > div > h2 > div > div > span.plan-name'
    ).text
    logging.info(f'Currently the {current_plan} is selected')
    plans = browser.find_elements_by_css_selector('span.plan-name')
    # Now we click the premium plan (the exact term here may be
    # language dependent)
    for plan in plans:
        if plan.text == 'Premium':
            plan.click()
    browser.find_element_by_css_selector(
        'button.btn.save-plan-button.btn-blue.btn-small'
    ).click()
    browser.find_element_by_css_selector(
        'button.btn.modal-action-button.btn-blue.btn-small'
    ).click()
    logging.info('Upgraded to Premium')
    # Now we downgrade to our original plan
    browser.get('https://www.netflix.com/ChangePlan')
    for plan in plans:
        if plan.text == current_plan:
            plan.click()
    browser.find_element_by_css_selector(
        'button.btn.save-plan-button.btn-blue.btn-small'
    ).click()
    browser.find_element_by_css_selector(
        'button.btn.modal-action-button.btn-blue.btn-small'
    ).click()
    logging.info("Downgraded to the original plan again")

Of course this trick has to be deployed to AWS Lambda. We can’t be bothered to do this each month 🙈. I am working on that using Selenium and serverless Chrome.

Disclaimer: This code may or may not do what you and I expect. Run it at your own risk. In the worst case, it may actually upgrade your account, without doing the downgrade.

[Drone footage] Kinderdijk in the snow

ETL pipeline out of Polar Flow into Runkeeper, using Serverless

TLDR: In this post, I show how to use the Serverless framework to hastily stitch together two undocumented APIs. Using this current-year-equivalent-of-a-cronjob, I export my data from Polar Flow to Runkeeper. The end result is in the repo polar-flow-to-runkeeper.

The (admittedly first world) problem I faced recently, is that I used Runkeeper in the past, but now my running data gets synced with Polar Flow. Both accounts work well enough, so I would like them to be in sync. Both services don’t provide documented APIs. Like most other (web) apps, there is an undocumented API however.

Undocumented APIs

Using undocumented APIs becomes more and more an alternative to web scraping, since most apps and websites separate content and logic better. Filling in the content into web pages is no longer done server-side, but more and more client-side.

Fair warning: Code that relies on undocumented APIs, is guaranteed to fail at some point. The fact that these APIs are undocumented, means that they can go away, be replaced or change behaviour.

Finding out how to call the API is relatively easy. Looking in the Network tab of the Developer Tools of your favourite browser will most of the time give you all the answers. Look for requests that are marked as XHR and JSON.

Authentication can often be replicated by storing the cookies that are returned by a call to the login page. The library requests has a class Session, which can fully automate this. To authenticate to Polar Flow, all you need to do is extend requests.Session with a call to log in to the service:

from requests import Session

class PolarFlowClient(Session):

    def __init__(self):
        super().__init__()

    def login(self, username, password):
        return self.post('https://flow.polar.com/login',
                         data={"email": username,
                               "password": password,
                               "returnUrl": '/'})

After instantiating PolarFlowClient and calling its login method, you’re good to submit your API requests.

We do something similar for Runkeeper. Check the repository to see the code for that.

Serverless

The Serverless Framework is a tool for deploying code that runs on-demand. The central idea is that you slice code you would like to run into functional units, that can be called individually. This can be a very cost-effective way to run a service, without having to rent cloud machines 24/7. It also offers scheduling functionality, which I use here to sync data periodically.

All big cloud providers have products that are compatible with Serverless functions. For this deployment, I chose AWS Lambda. Amazon’s documentation will guide you through authenticating your development machine and installing the required tooling.

I copied some code from the examples repo provided by the Serverless Framework. After modifying serverless.yml, it looked like this:

service: polar-flow-to-runkeeper  # How the service will be known
frameworkVersion: ">=1.2.0 <2.0.0"  # Provided by the example ¯\_(ツ)_/¯
plugins:
  - serverless-python-requirements  # Makes sure requirements are installed
provider:
  name: aws  # I use AWS Lambda in this project
  runtime: python3.7  # Python >3.6 for the f-strings :-)
  memorySize: 256  # Overwrite the default memory size. Default is 1024 (MB).
  timeout: 60  # It's important to set this higher than you actually expect 
               # the function to run (in seconds).
functions:
  cron:  # 'cron' is the name of the function, you could list more below
    handler: handler.run  # Path to the function
    events:
      - schedule: rate(30 minutes)  # Run every thirty minutes
                                    # (not a suggested workout schedule)

The pipeline

Syncing is as simple as looping over the entries in Polar Flow and uploading each one to Runkeeper if it doesn’t exist in our list of already uploaded entries. That looks something like this:

# Import statements have been omitted.
# The objects flow and runkeeper are authenticated API clients.
# synced_runs is a list we keep in MongoDB (MLab has a free tier)
year = datetime.datetime.now().year
activities = flow.get('https://flow.polar.com/training/getCalendarEvents',
                      params={'start': f'01.01.{year}',
                              'end': f'31.12.{year}'}).json()
activities = filter(lambda x: x['listItemId'] not in synced_runs, 
                    activities)
for activity in activities:
    tcx_export = flow.get(
        'https://flow.polar.com/api/export/training/tcx/' +
        str(activity['listItemId'])
    )
    response = runkeeper.post(
        'https://runkeeper.com/trackMultipleFileUpload',
        data={'handleUpload': 'handleUpload'},
        files={'trackFiles': ('import.tcx', tcx_export.text,
                              'application/octet-stream')}
    )
    synced_runs.append(activity['listItemId'])
# afterwards, synced_runs is put back into the database

Now that everything has been put together, it can be deployed, by typing serverless deploy.

Data quality issues with Runkeeper’s importer

Using Runkeeper’s importer unfortunately means you lose some data quality:

  • Looking into the response provided by Runkeeper shows lots of activities can’t actually be imported (swimming for example).

  • Runkeeper also flattens your heart rate, which sounds really scary, but just means it sets the heart rate to your average across the entire activity.

  • Given the same set of GPS points as Polar Flow, Runkeeper could calculate a different distance.

Final thoughts

Playing around with Serverless is cool if there’s not much at stake. The synchronization of Polar Flow and Runkeeper fits comfortably in Amazon’s free tier. The tooling is intuitive, and there are a lot of provided examples.

An important concern I have, are the costs of using Serverless. It starts out cheap, but Serverless functions have a great potential to become a very expensive hobby when building larger projects: Eventually, you might want Serverless functions that call other Serverless functions. You’ll need a very good overview of your system to avoid infinite or circular loops. The only exit condition might be your credit card bottoming out.

Python tools for the BlackVue dashcam

TLDR: By default the options of getting data out of BlackVue dashcams in open standards is limited. I have created some tools to improve this. Run pip install blackvue_acc blackvue_gps blackvue_wifi to get them all.

Recently I bought a BlackVue dashcam, equipped with GPS, accelerometer and Wi-Fi. This makes it a nice (and connected) source of data. The default way this data comes out of the camera is not great though. Some of the disadvantages are the closed-source apps and desktop software, with some features even requiring a BlackVue cloud account.

In this post I write about the data in these dashcams, how to get it out, and how to parse it.

  • blackvue_acc: package for extracting and parsing the accelerometer data
  • blackvue_gps: package for extracting and parsing the GPS data
  • blackvue_wifi: package for interacting with the HTTP API over WiFi

GPS data

All data the dashcam produces is embedded in the mp4-files as datastreams. gandy92 created a script to parse these datastreams and give the files a logical extension: blackclue

The resulting file with the GPS records has the extension .nmea. The contents were a unix timestamp in square brackets, followed by a NMEA ‘sentence’. The 1990s internet provides more information about NMEA records (1) (2). Fortunately, there was a Python package that could parse these records. My script (blackvue_gps) first extracts the timestamp with a regex, and then simply takes the parsed NMEA-records that have a latitude and longitude, and writes the result to a csv.

The code looks somewhat like the code block below (full source):

# imports and looping are omitted here
unix_ms = re.findall(r'(?!\[)[0-9]*(?=\])', line)[0]
parsed_nmea = pynmea2.parse(line.split(']')[-1])
if hasattr(parsed_nmea, 'latitude') and hasattr(parsed_nmea, 'longitude'):
    lat = parsed_nmea.latitude
    lon = parsed_nmea.longitude

If you just want csv files with your gps records, run

pip install blackvue_gps

to install the tool and then run

blackvue_gps --to-csv /path/to/your/folder

or

blackvue_gps --to-csv /path/to/your/file.mp4

When visualising the data in this csv file, the high resolution of the GPS data really stood out. The location is logged with a frequency of about once a second.

Accelerometer data

The previously mentioned blackclue library already adds separation to the binary .3gf blobs and creates a .3gf.txt. This file is basically a csv, but separated with n spaces instead of comma’s. Python’s built-in csv module has no support for regex separators, but pandas.read_csv does, so I used that, realising that it might be a bit overkill for the job.

       0 00000000 0064 0002 0012      0    100      2     18
       1 00000064 0078 fffa 0028    100    120     -6     40
       2 000000dc 0054 0008 000a    220     84      8     10
       3 00000153 0082 fffa 000e    339    130     -6     14
       4 000001b7 009e fffe 0010    439    158     -2     16
       5 0000021b 007a 0004 0004    539    122      4      4
       6 00000286 0076 fffc 0010    646    118     -4     16
       7 000002ed 0088 0006 002e    749    136      6     46
       8 00000351 0078 fffe 0016    849    120     -2     22

The block above contains an example of the first few lines of a .3gf.txt file. This also shows high resolution data, with measurements about 10 times per second.

Its columns are:

  • the milliseconds since the start of the video,
  • forces in three directions as signed 2 byte integers

By looking at the video associated with the file, it looked like 1 G(-force) equals 128 as an integer (gravity when stationary). Please remember that this is not yet confirmed. The order of the columns seemed to be y, x, z, when examined next to a video with some action in each direction.

The code looks somewhat like the code block below (full source):

# imports and looping omitted here
columnnames = ['id', 'ms_since_start_hex', 'y_hex',
               'x_hex', 'z_hex', 'ms_since_start',
               'y_int', 'x_int', 'z_int']
df = pandas.read_csv(filename,
                     sep='\s+',
                     names=columnnames)
df['y'] = df['y_int'] / 128  # wild guess factor is high here
df['x'] = df['x_int'] / 128  # 1G is assumed to be 128 as integer
df['z'] = df['z_int'] / 128  # based on what the sensor for y registers
df.to_csv(filename.replace('.3gf.txt', '.acc.csv'))

If you just want csv files with your accelerometer data, run

pip install blackvue_acc

to install the tool and then run

blackvue_acc --to-csv /path/to/your/folder

or

blackvue_acc --to-csv /path/to/your/file.mp4

Connecting over Wi-Fi

A file found on GitHub by johnhamelink had some good pointers about the location of the HTTP API exposed over Wi-Fi. With this, I wrote a script to list and download all recordings.

Again, installation is as easy as

pip install blackvue_wifi

Downloading all data from a dashcam connected over Wi-Fi can then be done with:

blackvue_wifi --download-all-recordings ./

The dashcam also has upload functionality. This creates the possibility of extracting the GPS and accelerometer data and saving it on the device itself. When the video files are then removed from the device to make space for new recordings, the other data can persist. You could do this with a permanently connected Raspberry Pi for example.

Conclusion

BlackVue dashcams are a nice source of data, and with some scripting you can get all of it out in open formats. In a future post, I’ll show ways to combine the data from these sources in an interactive visualisation.

If you wish to contribute features or bug fixes to the blackvue Python packages, here’s a list of issues you could work on:

  • GeoJSON output for blackvue_gps
  • Changing settings of the dashcam with blackvue_wifi
  • Unit tests (and maybe tox configuration) for all packages

Using a Google Spreadsheet as an app's backend. What could possibly go wrong?

TLDR: You could use a Google Spreadsheet as the backend of an app, but you really shouldn’t. Using Python to programmatically update parts of a spreadsheet makes way more sense than using a spreadsheet to get data in Python.

Imagine making an app for someone that would like to distribute a schedule for an event, that can change often during the event. You can’t include the schedule into the app itself, because users might end up with outdated data. Probably, you’ll choose to set up a REST API with some endpoints.

Now you need a way for the event organiser to update data. And what administrative interface would be more familiar to an event organiser than a spreadsheet?

What if I told you, you could use a Google Spreadsheet as a database backend for your app? With the Python packages gsheets and an HTTP server like Flask, you can have your API endpoint online in a matter of minutes. You are not even limited to reading data: gsheets can write to the sheet as well.

In the example code below, we want to get a list of sessions, with each session hosting multiple pitches. We do this join by letting columns in the spreadsheet refer to id columns of rows on a different sheet.

from flask import Flask, Response
from gsheets import Sheets
import json
from datetime import datetime

sheets = Sheets.from_files('client_secret.json', 'storage.json')
spreadsheet = sheets['identifier-of-your-spreadsheet-here']
app = Flask(__name__)


def process_sheet(spreadsheet):
    sessions = spreadsheet.find('Sessions')
    pitches = spreadsheet.find('Pitches')
    sessions_merged = []
    for session in sessions.values()[1:]:
        included_pitches = []
        for pitch in pitches.values():
            if pitch[4] == session[0]:  # nested for loops are a very sad and 
                                        # expensive way to join data
                included_pitches.append({"title": pitch[3],
                                         "speaker": pitch[1],
                                         "summary": pitch[2]})
        sessions_merged.append(
            {"name": session[1],
             "start": datetime.strptime(session[3], 
                                        '%d-%m-%Y %H:%M').isoformat(),
             "end": datetime.strptime(session[4], 
                                      '%d-%m-%Y %H:%M').isoformat(),
             "location": session[5],
             "host": session[2],
             "summaries": included_pitches
             }
        )
    return sessions_merged


@app.route('/sessions')
def sessions():
    response = Response(json.dumps(process_sheet(spreadsheet)))
    response.headers['Access-Control-Allow-Origin'] = '*'
    return response
    
    
if __name__ == "__main__":
    app.run()

Obviously, this code is far too brittle to be anywhere near a production system. Any wrong input, data type, change of column order, empty id column, empty row, or other change could crash the complete API endpoint. The sweet spot for using a script like this is when making a rapid prototype (i.e. hackathons). The admin backend is just a Google Spreadsheet, saving you valuable time developing a boring backend.

Please consider that spreadsheets:

  • typically don’t set data types for columns, but just for cells, making it very unpredictable what you’ll get
  • are not known for their scalability
  • let you do your foreign keys manually

Fair warning: Never YOLO a script like this into production. If you really want to use Google Spreadsheets as your production backend, you should add more checks and safeties (and maybe look into some of Google’s actual database services).

Nevertheless, it’s cool Google Spreadsheets has an API. In my opinion, using Python to programmatically update parts of a spreadsheet makes way more sense, than using a spreadsheet to get data in Python.

Edit: After posting this code snippet, I found an article describing the perfectly named npm module get-sheet-done. With this, you can remove the Flask API from the equation entirely, and interact with the Google Sheets API directly from JavaScript. In this case, your spreadsheet needs to be publicly visible though.