14 Feb 2019
TLDR: In this post I show how to take advantage of Netflix delivering your
new subscription before your payment starts. You could do this manually, but
of course
"it’s more fun to compute".
Netflix has an interesting upgrade flow: Once you upgrade, you get the
upgraded plan for the remainder of the billing period. You only
start paying your new fee at the start of the new period.
However, if you upgrade, and downgrade in the same billing period, you’ll
get the upgraded plan for the remainder of the current billing period. At
the start of the new billing period, you’ll be downgraded to your original
plan again. Of course, there’s nothing stopping you from doing the same thing
again. Therefore, if you repeat this every billing period, you can have the
best plan for the lowest price.
This raises a question: “This can’t be intentional, can it?”.
After I submitted a short bug report, Netflix replied that it is indeed
intended behaviour:
We actually received a similar report previously about this one and [decided]
that this is actually an intended functionality.
So… here’s the code, using Selenium:
import logging
from collections import namedtuple
from selenium import webdriver
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
Configuration = namedtuple('Configuration', ['username', 'password'])
config = Configuration(username='username_here',
password='password_here')
options = webdriver.ChromeOptions()
options.binary_location = "./headless-chromium"
browser = webdriver.Chrome(executable_path='./chromedriver',
chrome_options=options)
browser.implicitly_wait(time_to_wait=10)
browser.get('https://www.netflix.com/ChangePlan')
browser.find_element_by_id('id_userLoginId').send_keys(
config.username
)
browser.find_element_by_id('id_password').send_keys(
config.password
)
browser.find_element_by_css_selector(
'button.btn.login-button.btn-submit.btn-small'
).click()
try:
message = browser.find_element_by_css_selector(
'div.ui-message-contents'
).text
logging.info('Page contains infobox (probably stating that Netflix '
'has already been upgraded this month')
logging.info(message)
logging.info('Nothing to do left')
quit(0)
except TimeoutError:
# The upgrade has not been done this month yet, because there's no
# infobox saying so
current_plan = browser.find_element_by_css_selector(
'li.selected > div > h2 > div > div > span.plan-name'
).text
logging.info(f'Currently the {current_plan} is selected')
plans = browser.find_elements_by_css_selector('span.plan-name')
# Now we click the premium plan (the exact term here may be
# language dependent)
for plan in plans:
if plan.text == 'Premium':
plan.click()
browser.find_element_by_css_selector(
'button.btn.save-plan-button.btn-blue.btn-small'
).click()
browser.find_element_by_css_selector(
'button.btn.modal-action-button.btn-blue.btn-small'
).click()
logging.info('Upgraded to Premium')
# Now we downgrade to our original plan
browser.get('https://www.netflix.com/ChangePlan')
for plan in plans:
if plan.text == current_plan:
plan.click()
browser.find_element_by_css_selector(
'button.btn.save-plan-button.btn-blue.btn-small'
).click()
browser.find_element_by_css_selector(
'button.btn.modal-action-button.btn-blue.btn-small'
).click()
logging.info("Downgraded to the original plan again")
Of course this trick has to be deployed to
AWS Lambda. We can’t be bothered to do
this each month 🙈. I am working on that using Selenium and
serverless Chrome.
Disclaimer: This code may or may not do what you and I expect. Run it at
your own risk. In the worst case, it may actually upgrade your account,
without doing the downgrade.
25 Nov 2018
TLDR: In this post, I show how to use the Serverless framework to hastily
stitch together two undocumented APIs. Using this
current-year-equivalent-of-a-cronjob, I export my data from Polar Flow to
Runkeeper. The end result is in the
repo polar-flow-to-runkeeper.
The (admittedly first world) problem I faced recently, is that I used
Runkeeper in the past, but now my running data gets synced with Polar Flow.
Both accounts work well enough, so I would like them to be in sync. Both
services don’t provide documented APIs. Like most other
(web) apps, there is an undocumented API however.
Undocumented APIs
Using undocumented APIs becomes more and more an alternative to web scraping,
since most apps and websites separate content and logic better. Filling in
the content into web pages is no longer done server-side, but more and more
client-side.
Fair warning: Code that relies on undocumented APIs, is guaranteed to
fail at some point. The fact that these APIs are undocumented, means that
they can go away, be replaced or change behaviour.
Finding out how to call the API is relatively easy. Looking in the
Network tab of the Developer Tools of your favourite browser will most of
the time give you all the answers. Look for requests that are marked as XHR
and JSON.
Authentication can often be replicated by storing the cookies that are
returned by a call
to the login page. The library requests
has a class Session
, which
can fully automate this. To authenticate to Polar Flow, all you need to do
is extend requests.Session
with a call to log in to the service:
from requests import Session
class PolarFlowClient(Session):
def __init__(self):
super().__init__()
def login(self, username, password):
return self.post('https://flow.polar.com/login',
data={"email": username,
"password": password,
"returnUrl": '/'})
After instantiating PolarFlowClient
and calling its login
method,
you’re good to submit your API requests.
We do something similar for Runkeeper. Check the repository to see the code
for that.
Serverless
The Serverless Framework is a tool for deploying code that runs on-demand.
The central idea is that you slice code you would like to run into functional
units, that can be called individually. This can be a very cost-effective
way to run a service, without having to rent cloud machines 24/7. It also
offers scheduling functionality, which I use here to sync data periodically.
All big cloud providers have products that are compatible with Serverless
functions. For this deployment, I chose
AWS Lambda. Amazon’s documentation will
guide you through authenticating your development machine and installing the
required tooling.
I copied some code from the examples repo provided by the Serverless
Framework.
After modifying serverless.yml
, it looked like this:
service: polar-flow-to-runkeeper # How the service will be known
frameworkVersion: ">=1.2.0 <2.0.0" # Provided by the example ¯\_(ツ)_/¯
plugins:
- serverless-python-requirements # Makes sure requirements are installed
provider:
name: aws # I use AWS Lambda in this project
runtime: python3.7 # Python >3.6 for the f-strings :-)
memorySize: 256 # Overwrite the default memory size. Default is 1024 (MB).
timeout: 60 # It's important to set this higher than you actually expect
# the function to run (in seconds).
functions:
cron: # 'cron' is the name of the function, you could list more below
handler: handler.run # Path to the function
events:
- schedule: rate(30 minutes) # Run every thirty minutes
# (not a suggested workout schedule)
The pipeline
Syncing is as simple as looping over the entries in Polar Flow and
uploading each one to Runkeeper if it doesn’t exist in our list of already
uploaded entries. That looks something like this:
# Import statements have been omitted.
# The objects flow and runkeeper are authenticated API clients.
# synced_runs is a list we keep in MongoDB (MLab has a free tier)
year = datetime.datetime.now().year
activities = flow.get('https://flow.polar.com/training/getCalendarEvents',
params={'start': f'01.01.{year}',
'end': f'31.12.{year}'}).json()
activities = filter(lambda x: x['listItemId'] not in synced_runs,
activities)
for activity in activities:
tcx_export = flow.get(
'https://flow.polar.com/api/export/training/tcx/' +
str(activity['listItemId'])
)
response = runkeeper.post(
'https://runkeeper.com/trackMultipleFileUpload',
data={'handleUpload': 'handleUpload'},
files={'trackFiles': ('import.tcx', tcx_export.text,
'application/octet-stream')}
)
synced_runs.append(activity['listItemId'])
# afterwards, synced_runs is put back into the database
Now that everything has been put together, it can be deployed, by typing
serverless deploy
.
Data quality issues with Runkeeper’s importer
Using Runkeeper’s importer unfortunately means you lose some data quality:
-
Looking into the response
provided by Runkeeper shows lots of activities
can’t actually be imported (swimming for example).
-
Runkeeper also flattens your heart rate, which sounds really scary, but just
means it sets the heart rate to your average across the entire activity.
-
Given the same set of GPS points as Polar Flow, Runkeeper could calculate a
different distance.
Final thoughts
Playing around with Serverless is cool if there’s not much at
stake. The synchronization of Polar Flow and Runkeeper fits comfortably in
Amazon’s free tier. The tooling is intuitive, and there are a lot of
provided examples.
An important concern I have, are the costs of using Serverless. It starts
out cheap, but Serverless functions have a great potential to become a very
expensive hobby when building larger projects: Eventually, you might
want Serverless functions that call other Serverless functions.
You’ll need a very good overview of your system to avoid infinite or
circular loops. The only exit condition might be your credit card bottoming
out.
16 Aug 2018
TLDR: By default the options of getting data out of
BlackVue dashcams in open standards is limited. I have created some tools to
improve this. Run pip install blackvue_acc blackvue_gps blackvue_wifi
to get them all.
Recently I bought a BlackVue dashcam, equipped with GPS, accelerometer and
Wi-Fi. This makes it a nice (and connected) source of data. The default way
this data comes out of the camera is not great though. Some of the
disadvantages are the closed-source apps and desktop software, with some
features even requiring a BlackVue cloud account.
In this post I write about the data in these dashcams, how to get it out,
and how to parse it.
blackvue_acc
: package for extracting and parsing the accelerometer data
blackvue_gps
: package for extracting and parsing the GPS data
blackvue_wifi
: package for interacting with the HTTP API over WiFi
GPS data
All data the dashcam produces is embedded in the mp4-files as
datastreams. gandy92 created a script to parse
these datastreams and give the files a logical extension:
blackclue
The resulting file with the GPS records has the extension .nmea
. The
contents were a unix timestamp in square brackets, followed by a NMEA
‘sentence’. The 1990s internet provides more information about NMEA records
(1) (2). Fortunately,
there was a Python package that could parse these records. My script
(blackvue_gps
) first extracts the timestamp with a regex, and then
simply takes the parsed NMEA-records that have a latitude and longitude, and
writes the result to a csv.
The code looks somewhat like the code block below
(full source):
# imports and looping are omitted here
unix_ms = re.findall(r'(?!\[)[0-9]*(?=\])', line)[0]
parsed_nmea = pynmea2.parse(line.split(']')[-1])
if hasattr(parsed_nmea, 'latitude') and hasattr(parsed_nmea, 'longitude'):
lat = parsed_nmea.latitude
lon = parsed_nmea.longitude
If you just want csv files with your gps records, run
pip install blackvue_gps
to install the tool and then run
blackvue_gps --to-csv /path/to/your/folder
or
blackvue_gps --to-csv /path/to/your/file.mp4
When visualising the data in this csv file, the high resolution of the GPS
data really stood out. The location is logged with a frequency of about once
a second.
Accelerometer data
The previously mentioned blackclue
library already adds separation to the
binary .3gf
blobs and creates a .3gf.txt
. This file is basically a csv,
but separated with n spaces instead of comma’s. Python’s built-in csv module
has no support for regex separators, but pandas.read_csv
does, so I used
that, realising that it might be a bit overkill for the job.
0 00000000 0064 0002 0012 0 100 2 18
1 00000064 0078 fffa 0028 100 120 -6 40
2 000000dc 0054 0008 000a 220 84 8 10
3 00000153 0082 fffa 000e 339 130 -6 14
4 000001b7 009e fffe 0010 439 158 -2 16
5 0000021b 007a 0004 0004 539 122 4 4
6 00000286 0076 fffc 0010 646 118 -4 16
7 000002ed 0088 0006 002e 749 136 6 46
8 00000351 0078 fffe 0016 849 120 -2 22
The block above contains an example of the first few lines of a .3gf.txt
file. This also shows high resolution data, with measurements about 10 times per second.
Its columns are:
- the milliseconds since the start of the video,
- forces in three directions as signed 2 byte integers
By looking at the video associated with the file, it looked like 1 G(-force)
equals 128 as an integer (gravity when stationary). Please remember that this
is not yet confirmed.
The order of the columns seemed to be y, x, z, when examined next to a video
with some action in each direction.
The code looks somewhat like the code block below
(full source):
# imports and looping omitted here
columnnames = ['id', 'ms_since_start_hex', 'y_hex',
'x_hex', 'z_hex', 'ms_since_start',
'y_int', 'x_int', 'z_int']
df = pandas.read_csv(filename,
sep='\s+',
names=columnnames)
df['y'] = df['y_int'] / 128 # wild guess factor is high here
df['x'] = df['x_int'] / 128 # 1G is assumed to be 128 as integer
df['z'] = df['z_int'] / 128 # based on what the sensor for y registers
df.to_csv(filename.replace('.3gf.txt', '.acc.csv'))
If you just want csv files with your accelerometer data, run
pip install blackvue_acc
to install the tool and then run
blackvue_acc --to-csv /path/to/your/folder
or
blackvue_acc --to-csv /path/to/your/file.mp4
Connecting over Wi-Fi
A file found on GitHub by johnhamelink
had some good pointers about the location of
the HTTP API exposed over Wi-Fi. With this, I wrote a script to list
and download all recordings.
Again, installation is as easy as
pip install blackvue_wifi
Downloading all data from a dashcam connected over Wi-Fi can then be done with:
blackvue_wifi --download-all-recordings ./
The dashcam also has upload functionality. This creates the possibility of
extracting the GPS and accelerometer data and saving it on the device itself.
When the video files are then removed from the device to make space for
new recordings, the other data can persist. You could do this with a
permanently connected Raspberry Pi for example.
Conclusion
BlackVue dashcams are a nice source of data, and with some scripting you can
get all of it out in open formats. In a future post, I’ll show ways to combine
the data from these sources in an interactive visualisation.
If you wish to contribute features or bug fixes to the blackvue
Python
packages, here’s a list of issues you could work on:
- GeoJSON output for
blackvue_gps
- Changing settings of the dashcam with
blackvue_wifi
- Unit tests (and maybe tox configuration) for all packages
02 Aug 2018
TLDR: You could use a Google Spreadsheet as the backend of an app, but you
really shouldn’t. Using Python to programmatically update parts of a
spreadsheet makes way more sense than using a spreadsheet to get data in
Python.
Imagine making an app for someone that would like to distribute a schedule
for an event, that can change often during the event. You can’t include the
schedule into the app itself, because users might end up with outdated data.
Probably, you’ll choose to set up a REST API with some endpoints.
Now you need a way for the event organiser to update data. And what
administrative interface would be more familiar to an event organiser than a
spreadsheet?
What if I told you, you could use a Google Spreadsheet as a database backend
for your app? With the Python packages gsheets
and an HTTP server like
Flask, you can have your API endpoint online in a matter of minutes. You
are not even limited to reading data: gsheets
can write to the sheet as
well.
In the example code below, we want to get a list of sessions, with each
session hosting multiple pitches. We do this join by letting columns in the
spreadsheet refer to id columns of rows on a different sheet.
from flask import Flask, Response
from gsheets import Sheets
import json
from datetime import datetime
sheets = Sheets.from_files('client_secret.json', 'storage.json')
spreadsheet = sheets['identifier-of-your-spreadsheet-here']
app = Flask(__name__)
def process_sheet(spreadsheet):
sessions = spreadsheet.find('Sessions')
pitches = spreadsheet.find('Pitches')
sessions_merged = []
for session in sessions.values()[1:]:
included_pitches = []
for pitch in pitches.values():
if pitch[4] == session[0]: # nested for loops are a very sad and
# expensive way to join data
included_pitches.append({"title": pitch[3],
"speaker": pitch[1],
"summary": pitch[2]})
sessions_merged.append(
{"name": session[1],
"start": datetime.strptime(session[3],
'%d-%m-%Y %H:%M').isoformat(),
"end": datetime.strptime(session[4],
'%d-%m-%Y %H:%M').isoformat(),
"location": session[5],
"host": session[2],
"summaries": included_pitches
}
)
return sessions_merged
@app.route('/sessions')
def sessions():
response = Response(json.dumps(process_sheet(spreadsheet)))
response.headers['Access-Control-Allow-Origin'] = '*'
return response
if __name__ == "__main__":
app.run()
Obviously, this code is far too brittle to
be anywhere near a production system. Any wrong input, data type, change of
column order, empty id column, empty row, or other change could crash the
complete API endpoint. The sweet spot for using a script like this is when
making a rapid prototype (i.e. hackathons). The admin backend
is just a Google Spreadsheet, saving you valuable time developing a boring
backend.
Please consider that spreadsheets:
- typically don’t set data types for columns, but just for cells, making it
very unpredictable what you’ll get
- are not known for their scalability
- let you do your foreign keys manually
Fair warning: Never YOLO a script like this into production. If you really
want to use Google Spreadsheets as your production backend, you should add
more checks and safeties (and maybe look into some of Google’s actual
database services).
Nevertheless, it’s cool Google Spreadsheets has an API. In my opinion, using
Python to programmatically update parts of a
spreadsheet makes way more sense, than using a spreadsheet to get data in
Python.
Edit: After posting this code snippet, I found an article describing the
perfectly named npm module get-sheet-done.
With this, you can remove the Flask API from the equation entirely, and
interact with the Google Sheets API directly from JavaScript. In this case,
your spreadsheet needs to be publicly visible though.