I have toyed with this large list, and, like everybody else, tackled the md5 hashes first. When I have a bit of free time (and heaps of free memory) I run an attack. The final objective is to assess the effectiveness of using cracked password to attack the others. One of the attacks I ran is a markov mode (using the stats file distributed with jumbo, level 290, 24 chars max) run. What is interesting with it is that you can then generate nice plots, like this one.
Only it is pretty weird. The humps (blue arrows) are expected. This kind of artifact could come from the concatenation of multiple sources with distinct characteristics, or the prevalence of a pattern that is not common in the training set (organisation names for example).
What is weird is the part in the green circle. It is comprised between 250 and 290. The fact that there is a drop on the right is easy to explain : I ran only a few wordlist attacks and the markov attack. 290 is my cutout value, so we could expect that with a larger value the curve would be “continuous”. But the drop between 249 and 250 is much harder to explain. I have the following hypothesis :
- bug in my code (quite likely)
- somebody generated a large part of the included passwords with “genmkvpwd stats 250 12”
Having no clue about where the passwords come from, I can’t really conclude. But if somebody has better ideas …