Making Sense of 10010 OnionScan Results

A few months ago, Sarah Jamie Lewis released the wonderful OnionScan; a tool for enumerating (and resolving) potential security issues arising from poorly configured Tor Hidden Services. It’s kind of a big deal for people who are interested in that sort of thing.

As cool as OnionScan is, scanning Hidden Services one at a time tends to become rather tedious. Fortunately, Justin Seitz wrote up a nice tutorial on automating OnionScan through a Python wrapper, and being one of those people who are interested in that sort of thing, I set it all up on a dedicated server and left it to run for a few days.

Using Justin’s initial list of 8592 Hidden Services as a starting point, I ended up with 10010 completed scans (which was good) and 10010 distinct JSON files containing the results (which was not so good). “There’s bound to be something interesting in there”, I thought. I could get a rough idea of the state of things by grepping the results files for JSON entries, and even tried throwing EyeWitness at the web and VNC services, but still didn’t really get anywhere. What I really needed was some kind of database.

Introducing onionscan2db…

OnionScan can write its results out to machine-readable JSON files, so parsing them is fairly straightforward. I used Python for no other reason than I like it, and SQLite3 because it’s simple and Python supports it without the need for any additional modules.

The code is probably best described as “functional”. It’s not particularly pretty and there’s definitely room for speeding up the database writes, but it’ll take a take more than 10000 JSON formatted OnionScan results and build an SQLite database that can then be used to do something useful.

onionscan2db

The tool is available from GitHub and can be run from the following command

python onionscan2db -d <onionscan-results-directory> -o <output-database>

I’ll try to keep it updated in line with OnionScan, but the code is relatively modular and it shouldn’t be too difficult for anyone else to improve the database structure and import functions as necessary.

Thoughts on Running a Tor Exit Node for a Year

I’m a big fan of Tor. Both as a concept in that it allows people to access information that might otherwise be inaccessible*, and as an interesting technical project. In an effort to support the Tor network and to learn more about how it actually works, I’ve been hosting various Tor nodes on various boxes for a few years now but around this time last year I stepped things up a bit and began running an Exit node that has consistently ranked in the top 100 world-wide in terms of usable bandwidth.

When I mention this to people I tend to get the same questions, so I thought it best to write the answers here, and maybe save a few people (including myself) some time.

Do you need special hardware?

No, not really. The Tor daemon doesn’t really take advantage multi-core CPUs, so in most cases throwing extra processing power at it won’t give you much of an advantage. I rent a relatively low-end physical server (Celeron G530, 2GB RAM) but I found the biggest limitation to be affordable bandwidth. I have an uncapped 100MBit/s line to my server – not blisteringly fast but it’s saturated almost 100% of the time. In a typical month my Exit will shift somewhere around 35TB of traffic, combined upstream and down.

graph

What do your hosting company think about that?

They’re ok with it! Not all hosting companies are though so if you’re thinking of running any kind of Tor node make sure to check first. I’m in the UK, my hosting company are not. Depending on where you, your hosting company, and their data centre are geographically it’s unlikely to be illegal to run a node, but there’s a good chance it will be against the hosting companies T&C’s, particularly in the case of Exit nodes.

The Tor Project wiki holds a pretty comprehensive list of good and bad hosting companies and ISPs.

What about abuse reports?

There will be abuse reports. Learn to deal with them – ignoring them altogether is usually a good way to get on the bad side of your hosting company. There are things you can do to cut down on the number of abuse reports you receive; the most effective in my experience is to configure a reduced exit policy, blocking ports commonly used for things like SMTP and BitTorrent**. It’s not perfect, but it has dramatically cut the number reports I have to deal with – I tend to get about 1 a week on average now.

Can I run a Tor node from home?

You can, but it’s really best not to. That’s especially true for Exit nodes. For one thing, your home broadband connection is probably not fast enough to contribute any meaningful bandwidth. Second, the IP addresses of all Tor Relay (Middle node) and Exit nodes are publicly available, and as a straightforward way of cutting down on the sort of abuse I described above, more and more online services are just blocking all those IP addresses outright. It’s not very subtle but it does work! So you can run a Relay or an Exit from home, but you’ll probably find that sooner or later Netflix will stop working. Your call.

A better option for those who want to contribute to Tor from a home connection is running a Bridge node, or donating directly to an organisation like TorServers.net.

Aren’t you worried about the Police/GCHQ/Mossad/3PLA/etc?

Not especially, I’ve certainly never had any legal troubles because of Tor. By it’s very nature though traffic from a system like Tor is likely to be more interesting than the rest of the internet as far as a nation-state is concerned, and with only about 1000 Exit nodes running, monitoring all of them is well within the capabilities of a reasonably funded SIGINT agency. I assume my Exit (along with all the rest of them) is being monitored, if not actively targeted. Other than hardening the box as far as possible there’s not much more that can be done against an adversary like that. Who knows, maybe one day I’ll end up with some fun malware to analyse.

 

* I’ve done a lot of forensics work in my time and been exposed to all kinds of Bad Stuff as a result. I am by no means naive enough to suggest that systems like Tor don’t help people access Bad Stuff, but I think on balance the positive uses outweigh the bad ones.

** BitTorrent over Tor is a bad idea in general. Firstly, it doesn’t give you any anonymity. And second, it slows the network down for everyone else. I block common BitTorrent ports on my exits. Don’t like it? Run your own.