Dictionary mangling rules update : the Best64 Challenge

The Best64 challenge finished recently. For those who were unaware of its existence, it was about creating a set of 64 mangling rules that were optimized for a given dictionary and a given password file. As I am pretty interested by this topic, I tried to compete with my own set of tools (that are released by the way). Unfortunately, all my tools are setup to work with John the Ripper, and while the mangling rules are somewhat compatible, they are not entirely.

It turns out that I managed to get 22194 matches with my scripts. I only played for about 4 hours, but I should have had a considerable advantage as I already had all the scripts ready. It turns out that I would only have reached the 7th spot, IF my rules translated to Hashcat. This means several things :

  • My “base” rules were not comprehensive : looking at the solutions I realized this was indeed the case. The rule set has been fleshed out considerably.
  • Solving the coverage problem with a greedy algorithm works well only with a large set of rules. With only 64 rules a much better solution is required, but my tools were designed for working with much larger data sets. It means that the NP-hardness of the problem is a practical limitation here, when handling millions of rules.
  • The approach of combining a base rule with some characters appended before and/or after is not optimal. This is pretty obvious, but I thought it was a minor concern. It turns out it is not with this challenge.

I will try to compute a new rule list with the new “base” rules, but it requires more than a terabyte of disk space right now.

Advertisements

5 thoughts on “Dictionary mangling rules update : the Best64 Challenge

  1. A similar method was documented in 2009. It cracked some pretty impressive hashes at the time, most of which had been hit hard by other searcher. It was able to find some pretty long and impressive words (16, 18, 20 character passwords were found). It was never explored quite as in depth as this series, so it is good to see that continuation of this design has carried forward. Also, the rules generated were append/prepend character by character, which is now sub-optimal. This project was done prior to the JtR rules allowing append/prepend of strings, so the rules generated by my original code are not very good, by todays standards.

    The wiki page was build on john’s community wiki, to demonstrate this method. http://openwall.info/wiki/john/rules and the archive of the email post: http://www.openwall.com/lists/john-users/2009/07/27/3

    It’s great to see this research continuing to be expanded.

    Jim.

  2. Mine was more simplistic, but I was amazed at how well it did work.

    I did more of a ‘small-bite’ method of attack. I (think) I also did this with ALL found passwords (so if using rockyou, you would use the 400mb original non-unique release, not the sort -n version). It could be done either way, but since I search for pre-ap pends, and simply want to know how many of them I have, knowing that there were 80 users using password123 and only 1 using password_!@# shows me that appendeded 123 is 80 times ‘hotter’ than _!@#. But, the stats could be collected either way (after sort -n or without the sort -n), and only matters IF you have the actual duplicate guesses to start with.

    Simply put, I made a few tools, hooked them together with stadard nix sort, cut, awk?, sed? type tools.

    First tool loaded the dictionary (I simply used english dictionary, but others could). It then started reading in the cracked PW’s. It would search each password, for the words in the dict. I think I had a min sized dictionary word (3 chars, or 4 chars, I think). When it would find dict word, within a PassWord. It would simply output the append and prepend data as a stdout printf. It would output a line ‘app data” or “pre data” (or something like that).

    Then I used grep | cut | sort > new_files to create a new app file and a new prepend file that were sorted duplicate strings, but in 2 separate files.

    I then created a ‘uniq’ like tool, to uniq each of these data files. They would walk this sorted file, and if there were 16800 lines of 123 in a row, this tool would simply output 16800 123 This tool was run on the 2 input files, creating 2 new unsorted files, that list counts, and a unique data item to append/prepend.

    These 2 new files are again sorted, based upon a number sort (i.e. the count).

    Then the user would be able to select only the data which was used ‘often enough’, AND also know that the rules are in ‘most likely’ order.

    I then created code to read the ‘top’ item from prepend, and build a john rule, then the top append, and build a rule, then the next prepend, build a rule, then the next append, ….. This tool is the one that would need ‘fixing’ to generate better rules, now that JtR’s rules engine has been enhanced with string ops.

    So, doing it in ‘this’ manner, with this process flow, the machine required is not too big. It sounds like your memory requirements were pretty extreme. In mine, the mem requirements were simply loading the dictionary file in memory, so that it can be quickly searched.

    Are you planning on continuing with this research, looking for ‘new’ ways to auto-generate?

    • The argument about using a “uniq” password list or a full one depends on what you are looking for. I used the 136mb sorted list for my tests, but I suppose it is due to a design choice in my tools (to verify later).

      As explained, my tool starts like yours : the first step is about applying a rule, and finding the biggest dictionary word that would fit in the resulting password. Contrary to yours I believe I do not have a minimum size, but I do have a maximum size (I use a rolling hash to check for the largest dictionary word). It can use LARGE dictionaries (the one I use for tests is about 150mb).

      It then differs. Distinct rules can produce the same password. My goal was to find the “best” series of rule, ignoring overlaps. This means that I have to keep in memory all the already cracked passwords, and all the crackable passwords for each rule. As there are thousands of either it takes a lot of memory.

      I wrote a tentative genetic-algorithm driven search tool, but it sucks and uses even more memory.

      Right now I am going to focus on creating new “base rules” and other statistical password generators : more precise Markov systems.

      • A couple of differences. I used non-unique so that it would give a better ‘approximation’ of how likely any app/pre pend would be. Also, in my dictionary search, I did not stop when I found ‘a’ word. I searched the entire ‘valid’ part of the dictionary for every offset within the word. So a word such as this:

        password123 would generate this:
        pre pass
        app word123
        app 123

        Yes, many of these do end up making sort of ‘garbage’ data, but that is why I want the count. If word123 is only seen once, then it is simply removed from the list of acceptable candidates.

        This was one of the reasons why I wanted to output ‘dumb’ lines. There may be 8 or 10 of them coming from a single candidate password.

        I also found that using a ‘real’ dictionary caused the ‘better’ data to bubble up to the top better. A usually has a lot of these ‘rules’ already applied to common ‘real’ base words. Thus the password cracking dicts had a tendency of ‘flattening out’ the data, AND it also made the amount of data to be much larger. Yes, in running, more words WERE found, but the amount of rules, and amount of time used was greatly increased.

        Good stuff.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s