Obfuscation

Wendigo is now in a pretty good place and everything I’ve tested is in good working order. (Yet to test any encryption or generally pushing to GitHub, but that’s easy enough.) Although my perfectionism is being triggered because there’s always something I could tune (even if there’s not), I’ve decided to move onto writing a generation script which will output a fresh script with garbage variable names and randomized string obfuscation. This main purpose of this is to get around signature analysis, one of the main ways AV tries to identify malware.

The goal is to have a file that doesn’t look anything like it’s cousins, so it can’t be identified by any of it’s strings (including variable names, since it’s a script file (although it can be compiled)). I’ve checked out some python obfuscators but haven’t found anything I’m super happy with, so I’m starting by replacing all the variable (and function, etc.) names with a UID ($[0-9][0-9][0-9]) which can be iterated over and replaced with randomly generated strings. This means that I can just run the generation script and have a fresh copy of wendigo to distribute.

My method for obfuscating strings is also key to stealth. I wanted to have a way that strings could be in the file and sit in memory that didn’t look suspicious (matching other copies), but that also could be easily decoded into the same string across different copies (because they all need to talk to the same repo). My first though was to hash (or base64) strings into the string I needed (easier than it sounds, just use hash(“wendigo”) as the repo name), but that would still mean that each copy would have the same strings in the file and sitting in memory. I needed something with more collisions, because I need multiple, different strings to decode into the strings that I need. My solution was to interpolate random characters between each character of the string. Easy to decode, and serves all of my purposes. Every string is different, but they all decode to the same string. (Could also base64 encode at one point, but why bother? It already looks like base64, any computer should just pass over it, and either way it won’t get past a human (who wouldn’t even need to try to decode it if they’ve got a debugger handy).)

Leave a comment