Cracking Passphrases
by Javantea
Nov 30, 2014

Passphrases are very commonly used as high security keys in computer systems. This is because they are easier to remember than random strings of similar security. But this encourages the lazy use of low security passphrases as a security through obscurity. Password crackers like John the Ripper don't feature passphrase cracking and oclHashCat also lacks this feature. This makes it easier for these weak passwords to not be cracked. As part of my NSEC3 cracking, I wrote a trivial passphrase cracker which can take any wordlist and combine it with itself. While this isn't optimal and the code isn't fast enough to keep even a slow cracker running, it cracked 6947 difficult passphrases in the first 13 hours of cpu time (10 hours of wallclock time because the process wasn't running in parallel). It cracked 5774 passphrases in the 69 hours that followed. These were not the low hanging fruit that are commonly found in the first hours of cracking, all domains of 8 characters (alphanumeric plus hyphen) had been removed by brute force, and a range of good publicly available dictionaries were used with John the Ripper's rules turned on. Furthermore, John the Ripper's Markov mode had been used up to level 260 with two different optimized trees. Therefore, all other methods had been extinguished (cracking with all other methods took over a month and produced 123988 cracked words). This shows the true security of passphrases. If the passphrase is made up of words found in a dictionary of the most popular 8000 words and has three words, the maximum strength if those words were chosen randomly is: 8000*8000*8000 = 512 billion = 39 bits. Having 39 bits of security in a password is not enough for resistance against hash cracking. Strong hashes like PBKDF2, bcrypt, and MD5 crypt are not resistant against amateur cracking with 39 bits of security.

oclHashCat md5crypt on 8 strong GPUs can do 27 MH/s. Therefore it would take 5 hours to crack all possible combinations of 3 word passphrases. Similarly oclHashCat can do 310 kH/s against PBKDF2-HMAC-SHA512 + AES, which would take a single machine 19 days to crack. The strongest, bcrypt resists for only 164 days.

The weakness of this passphrase cracker is clearly its speed. It can output 7454980 in 52 seconds, which is far slower than weak hash crackers take to consume the wordlist it generates. That corresponds to 143 kW/s. If you're trying to crack sha1 with a shared salt, this will be very slow in comparison to what John the Ripper or oclHashCat can do with a wordlist or a markov tree.

To improve the quality of this cracker, I am releasing a very strong wordlist based on my AI3 project. The wordlist is not perfect, but is a very good wordlist for cracking. It is far better than /usr/share/dict/words at cracking correct horse battery staple. This can be shown by finding where in the wordlist each word is:

3766:correct
2557:horse
6579:battery
13190:staple
The combined position is 3766*2557*6579*131190 = 8311351738834620

/usr/share/dict/words
43255:correct
87298:horse
19274:battery
187845:staple
The combined postion is 43255*87298*19274*187845 = 13671372128414504700

This shows that the AI3 wordlist is at least 1644 times better than /usr/share/dict/words at finding long passphrases. Depending on how you use it, the wordlist could save you years of computation time.

Important words like nano and nanotechnology are missing from /usr/share/dict/words while nanotechnology and nano show up in AI3's comprehensive wordlist.
36461:nanotechnology
91489:nano
176740:nanotech

Many proper nouns are not found in /usr/share/dict/words are found at the top of the AI3 wordlist.
1128:Angeles
3366:Seattle
3669:Microsoft
5847:Boeing
6434:Saskatchewan
6488:Cyprus
6651:MTV
6952:USSR
7592:Toyota
7602:Emmy
7622:PDF
9748:Gandhi
10045:Einstein
39304:Starbucks
91081:Yolo

This shows the power of the wordlist in prioritizing commonly-used English words. The wordlist also contains a wealth of foreign names because Wikipedia has a list of translation links on nearly every page.
1596:Catégorie
3229:José
4549:São
6813:André
7442:département
8092:Comté
10020:École
13425:Université
17566:Société
19912:España
24989:musique
25169:Nouveau
26291:Años
27424:Niño
29251:Muñoz
29713:Peña
34729:littérature
39721:Mademoiselle
48847:nouveau
53564:Señor
68776:Chasseurs
86534:Chasse
343830:Boisson
364817:alcools
395149:Alcool
619170:Banane

An example of a passphrase used in the past by a friend of mine can be found in our wordlist:
1300731:Konnichiwa
1055441:Ohayou
1125369:gozaimasu

The time it would take to find this password in the context it was used (WPA, 1336.4 kH/s) is:
1300731*1055441*1125369/1336400 = 1156058822608 seconds = 13380310 days = 36658 years.

This password is strong enough to thwart a strong amateur attacker using this wordlist until much better hardware becomes available or attacks become available against PBKDF2 used in WPA. But that assumes that the security of the passphrase depends on the wordlist. A different wordlist that prioritizes Japanese romanji words would crack the hash in seconds or minutes. Therefore, the wordlist chosen for the attack is of great importance to cracking. To be conservative, all possible wordlists should be considered when choosing a passphrase. Therefore, the KonnichiwaOhayougozaimasu passphrase is actually very weak. Using strong English words with strong foreign words would increase the security but using weak English words and weak foreign words together still leaves the passphrase woefully inadequate.

A person might think that adding strings of pseudorandom letters into the passphrase will help. To this end, I have created five fast passphrase crackers that work as variations to the original passphrase algorithm. The first tests passphrases that start with a word and end with random characters. The second tests passphrases that start with random characters and end with a word. The third test is the same as the first but that start with two words. The fourth test is the same as the second but with two words at the end. A fifth tests passphrases that start with a word, have random chracters in the middle and a second word at the end. These attacks were very effective against com domain hash list.


After xkcd proposed "correct horse battery staple" in the comic http://xkcd.com/936/ to be easy to remember but hard to crack, people started using this method to lengthen their passwords. This has mixed reviews. For one, longer passwords made up of words are more secure when used correctly. Unfortunately, the passwords we choose when we use words are less entropic than the passwords we choose when we go with letters or Schneier's first letter of each word approach. Of course, everything is dependent on how random you are. For example if your password is ttbbitfotn, it can be cracked easily. Why? Because William Blake's poem, "The Tyger" is incredibly well known and the Schneier method of the poem has been added to dicationaries. Also, the password is only 10 characters long. Adding a single long word would make it much harder to be cracked. But the question is how much more harder is it to crack? 

Amateurs in the password cracking field have a few tools that are pretty amazing at password cracking. John the Ripper, oclHashCat, rainbowcrack and a few other tools crack non-random passwords up to 13 characters long at ridiculous speeds. But what about non-random passwords longer than 13 characters? It turns out that most passwords more than 13 characters are very non-random. A password like correcthorsebatterystaple has very little entropy. How long does it take us to crack with John the Ripper, oclHashCat, and rainbowcrack? The answer is that these tools will not crack this password because passhprase cracking (multi-word password cracking) has not yet been implemented. Why? The project is an unwieldy one. Implementing tracking of position in a list is pretty easy, so why not just do it? Let's see how far we get.
