Markov statistics of the Yahoo leak.

You might be aware of the nice Yahoo leak. Here are Markov stats, trained from the RockYou set. It is nice to see that “markov2” doesn’t overfit.


Markov strength of InfoSecSouthwest2012_Ripe_Hashes

I have toyed with this large list, and, like everybody else, tackled the md5 hashes first. When I have a bit of free time (and heaps of free memory) I run an attack. The final objective is to assess the effectiveness of using cracked password to attack the others. One of the attacks I ran is a markov mode (using the stats file distributed with jumbo, level 290, 24 chars max) run. What is interesting with it is that you can then generate nice plots, like this one.

Only it is pretty weird. The humps (blue arrows) are expected. This kind of artifact could come from the concatenation of multiple sources with distinct characteristics, or the prevalence of a pattern that is not common in the training set (organisation names for example).

What is weird is the part in the green circle. It is comprised between 250 and 290. The fact that there is a drop on the right is easy to explain : I ran only a few wordlist attacks and the markov attack. 290 is my  cutout value, so we could expect that with a larger value the curve would be “continuous”. But the drop between 249 and 250 is much harder to explain. I have the following hypothesis :

  • bug in my code (quite likely)
  • somebody generated a large part of the included passwords with “genmkvpwd stats 250 12”

Having no clue about where the passwords come from, I can’t really conclude. But if somebody has better ideas …

Dictionary mangling rules update : the Best64 Challenge

The Best64 challenge finished recently. For those who were unaware of its existence, it was about creating a set of 64 mangling rules that were optimized for a given dictionary and a given password file. As I am pretty interested by this topic, I tried to compete with my own set of tools (that are released by the way). Unfortunately, all my tools are setup to work with John the Ripper, and while the mangling rules are somewhat compatible, they are not entirely.

It turns out that I managed to get 22194 matches with my scripts. I only played for about 4 hours, but I should have had a considerable advantage as I already had all the scripts ready. It turns out that I would only have reached the 7th spot, IF my rules translated to Hashcat. This means several things :

  • My “base” rules were not comprehensive : looking at the solutions I realized this was indeed the case. The rule set has been fleshed out considerably.
  • Solving the coverage problem with a greedy algorithm works well only with a large set of rules. With only 64 rules a much better solution is required, but my tools were designed for working with much larger data sets. It means that the NP-hardness of the problem is a practical limitation here, when handling millions of rules.
  • The approach of combining a base rule with some characters appended before and/or after is not optimal. This is pretty obvious, but I thought it was a minor concern. It turns out it is not with this challenge.

I will try to compute a new rule list with the new “base” rules, but it requires more than a terabyte of disk space right now.

Minus times minus is a plus

Everybody knows that vulnerability counts mean nothing about software security. And I’m sure everybody noticed that open source licenses comparison are equally useless. This surely means that comparing open source licenses by vulnerability count is incredibly insightful !

Enjoy, from this pdf :

Win !
Yes the best part is that you can go to 120%. Probably happens with commercial licenses.

Administrative activity audit with auditd and sudo

The hardest events to audit are events originating from administrative accounts, because they have the power do muddy any trace. This post will describe a way to salvage some of them, and to make sure you can trace non malicious root activity to a specific person. If you do not need that kind of accountability, then I suggest you just log as root on the servers, it will decrease the attack surface.

The first action is pretty obvious : you should have a nominative account for all root users, and make sure they can sudo to root (or better, use a simpler tool like calife). Then you should make sure nobody can log as root. On my servers I forbid password authentication and check the content of the authorized_keys file for root. The root login is not disabled because some scripts will need it.

Add this line to /etc/pam.d/common-session :

session required

This will ensure that the login uid is tracked and can be linked to any subsequent action, even after privileges have been elevated.

Install auditd, and configure it as follows :

  • Add this rule to auditd.rules, to track all actions by root that could lead to modifications in the filesystem :
-a always,exit -F euid=0 -F perm=wxa -k PCIDSS.10.2.2
  • In auditd.conf, you should make sure you log everything to syslog (you have centralized logging set up, right ?), so make sure those options look like this :
log_format = NOLOG
dispatcher = /sbin/audispd
  • Finally, activate the syslog plugin of audisp (set active=yes in /etc/audisp/plugins.d/syslog.conf)

There are probably a few rules you will need to add in audit.rules, such as the time modification system calls, or file modification in sensitive directories that originate from non root users.

Hopefully this should not catch too much activity, as auditd logs are pretty verbose and can fill hard drives pretty quickly.

Ruby : the best of all worlds

I have recently embarked on a quest to understand why some very smart people would insist that Ruby is awesome. I even bought a book, and started using it seriously on production systems. I believe I can now safely tell you why.

First of all, there is an awesome user community, with a host of expert rails coders, always attentive to the needs of the people in operations. That’s no wonder that Ruby is the language of DevOps ! I’m not sure what I like most about them, their refreshing attitude towards the use of memory (it is so cheap !), insistence on using specific versions of carefully crafted libraries with a stable API (that bundle tool is so easy to use) or testing methodology (there are so many useful tests, it is a pity that production performance is so hard to test for). This reminds me of the good old days of PHP, with its thriving community of smart hackers. The only difference I can think of is that the PHP coders were constrained by the damn sysadmins about memory usage, while the Ruby ones will eagerly put all your computational power to use.

Ruby programs are also incredibly easy to maintain, thanks to the unique features of the object model (singleton methods, mixins), stability of the libraries APIs, great coding conventions (two words : two tabs), compatibility between interpreter versions, and great code hiding capacities of blocks. It is really easy to spot memory leaks or understand how this cool feature works. To sum it up : its readability is comparable to Perl, with even more syntactic sugar.

Ruby interpreter speed is often described as horrible. Seasoned Ruby coders know the truth about it : it is a debugging feature. Ruby is just like Mac OS before X : tailored to work with the human mind. Who needs multitasking when your brain can only concentrate on a single thing at once ? Who needs speed when your brain only processes information that slowly ? With Ruby you can actually watch the computing taking place.

Modules are awesome too. That is no wonder that every time you want to use a single Rails app you have to install dozens of dependencies, each of them being gems of efficient computing. Never has a dependency management tool been so aptly named.

To be fair, Ruby is just like a baseball bat. Very fun to swing around, but not as nice when you are on the receiving end. I am using Puppet and Snorby for real. Puppet is typical of good Ruby projects : you can’t live without it, it is easy to start using (once you installed it) and horribly slow, to initialize or to run. I was pretty unhappy when the puppet master began to swap and I realized that I could not control Ruby memory usage. The best I can do is restart the rails instances every few requests. Did I mention how slow the initialization is ?

Snorby on the other hand is typical of average Ruby projects : cute looking, recommended as the new hot thing on every website you visit, a pain to install and then a torture to your production systems. Try doing a classification on 1M events, and it will suck up all your memory (it takes 15 minutes to use 4G, which is either really fast or really slow, depending on how you see it). After tuning, it can classify up to 12 events every second. And it needs megabytes and megabytes of memory to do that. There are also all kinds of cool AJAX effects. Except that after a few seconds visiting some pages, my browser starts using 100% CPU and freezes. I know Firefox isn’t the fastest browser on the planet, but the page is displaying almost nothing …

A log file for each virtual host with haproxy and rsyslog

When you run hundreds of web sites, it might be really convenient to store the access logs separately. While it is pretty straightforward to do this with Apache (just log each vhost in a distinct file), it gets more complicated with HaProxy. As a matter of facts, it only logs to syslog, so your syslog server will be required to do the sorting.

Here is a configuration excerpt that is used to log the host name queried by the client, and rejects requests to site not present in the websites.lst file.

capture request  header Host len 256
acl h_website hdr(host) -i -f /etc/haproxy/websites.lst
http-request deny if ! h_website

And here is how I do it on rsyslog :

$template HaLogs,"/logs/services/haproxy/%$YEAR:::secpath-replace%/%$MONTH:::secpath-replace%/%$DAY:::secpath-replace%/%syslogfacility-text:::secpath-replace%.%syslogseverity-text:::secpath-replace%.log"
$template HaHostnameLogs,"/logs/services/haproxy/%$YEAR:::secpath-replace%/%$MONTH:::secpath-replace%/%$DAY:::secpath-replace%/%msg:R,ERE,1,BLANK:[0-9]+/[0-9]+ \{([-.A-Za-z0-9]*)--end%/%syslogfacility-text:::secpath-replace%.%syslogseverity-text:::secpath-replace%.log"
if $programname == 'haproxy' and $msg contains '/<NOSRV> ' then -?HaLogs
& ~
if $programname == 'haproxy' then -?HaHostnameLogs
 & ~

The first two lines define where the logs are supposed to go. The secpath-replace are probably overkill, but I’m not confident the data is actually filtered. The HaLogs template stores this day common messages in /logs/services/haproxy/2011/10/05/, for example. The HaHostnameLogs used a regular expression to find the logged host name and use it to build the path. For, this will be /logs/services/haproxy/2011/10/05/

The first condition detects (very crudely) when a request is denied by HaProxy, and logs it using the first template. That way, malicious scanners will not clutter your syslog server with meaningless directory names. The rest should be neatly sorted.