Sick title. Thanks. (Low-key haven’t actually seen it yet.) (I’ve also been sitting on that for like 2 days.)
Well, it works. Spiderforce successfully spiders and builds a wordlist. You can find it at github.com/daniellohrey/spiderforce, and here’s the usage message:
usage: spiderforce.py [-h] (-d DOMAIN | -D DOMAINS) [-s SCOPE] [-o OUTSCOPE]
[-w WRITE] [-m MAX_DEPTH] [-t THREADS]
Spider a [list of] domain[s] and generate a wordlist based on all text found
optional arguments:
-h, –help show this help message and exit
-d DOMAIN, –domain DOMAIN — Domain to spider
-D DOMAINS, –domains DOMAINS — File with list of domains to spider
-s SCOPE, –scope SCOPE — File containing in scope strings defaults to [list of]
domain[s] if not specified
-o OUTSCOPE, –outscope OUTSCOPE — File containing out of scope strings
-w WRITE, –write WRITE — File to write wordlist out to instead of stdout
-m MAX_DEPTH, –max_depth MAX_DEPTH — Maximum depth of links to spider
-t THREADS, –threads THREADS — Number of worker threads to use, defaults to 4
It’s got some good options (if I do say so myself), as well as some sick code, which you should totally check out. But, one a the cool features, in my opinion, is how modular it is. The spiderforce.py script is really just the option parsing for the spider class (as well as the scope), so if you’d like a little more customization then you can just import it and use it in your own script.
But wait, there’s more. Or, at least, there will be more. Some features I plan on adding (along with some refactoring and design tweaking) are to move the wordlist to a class to abstract it away (so I can make it more efficient without changing much, probably with a dict/hash table/set/something efficient), and to add an option for word filtering (and mutating). I’ll also change the scoping to optionally use regex instead of just substrings. And I’m sure there’s another couple of features that I don’t remember off the top of my head (like verbosity maybe).
Anyway, it works, so, happy hacking.