936: Password Strength

From The Multilingual Encyclopedia
Jump to navigation Jump to search

<metadesc>Explain xkcd is a wiki dedicated to explaining the webcomic xkcd. Go figure.</metadesc>

Password Strength
To anyone who understands information theory and security and is in an infuriating argument with someone who does not (possibly involving mixed case), I sincerely apologize.
Title text: To anyone who understands information theory and security and is in an infuriating argument with someone who does not (possibly involving mixed case), I sincerely apologize.

Explanation[edit | edit source]

This comic is saying that the password in the top frames "Tr0ub4dor&3" is easier for password cracking software to guess because it has less entropy than "correcthorsebatterystaple" and also more difficult for a human to remember, leading to insecure practices like writing the password down on a post-it attached to the monitor.

In simple cases the entropy of a password is calculated as a^b where a is the number of allowed symbols and b is its length. A dictionary word (however long) has an entropy of around 65000, i.e. 16 bits. A truly random string of length 11 (not like "Tr0ub4dor&3", but more like "J4I/tyJ&Acy") has 94^11 = 72.1 bits. However the comic shows that "Tr0ub4dor&3" has only 28 bits of entropy. Another way of selecting a password is to have 2048 "symbols" (common words) and select only 4 of those symbols. 2048^4 = 44 bits, much better than 28.

It is absolutely true that people make passwords hard to remember because they think they are "safer", and it is certainly true that length, all other things being equal, tends to make for very strong passwords and this can confirmed by using rumkin.com's password strength checker. Even if the individual characters are all limited to [a-z], the exponent implied in "we added another lowercase character, so multiply by 26 again" tends to dominate the results.

In addition to being easier to remember, long strings of lowercase characters are also easier to type on smartphones and soft keyboards.

xkcd's password generation scheme requires the user to have a list of 2048 common words (log2(2048) = 11). For any attack we must assume that the attacker knows our password generation algorithm, but not the exact password. In this case the attacker knows the 2048 words, and knows that we selected 4 words, but not which words. The number of combinations of 4 words from this list of words is (211)4 = 244 bits. For comparison, the entropy offered by Diceware's 7776 word list is 13 bits per word. If the attacker doesn't know the algorithm used, and only knows that lowercase letters are selected, the "common words" password would take even longer to crack than depicted. 25 random lowercase characters would have 117 bits of entropy, vs 44 bits for the common words list.

Example

Below there is a detailed example which shows how different rules of complexity work to generate a password with supposed 44 bits of entropy. The examples of expected passwords were generated in random.org.(*)

If n is the number of symbols and L is the length of the password, then L = 44 / log2(n).

Symbols Number of symbols Minimum length Examples of expected passwords Example of an actual password Actual bits of entropy Comment
a 26 9.3 mdniclapwz jxtvesveiv troubadorx 16+4.7 = 20.7 Extra letter to meet length requirement; log2(26) = 4.7
a 9 36 8.5 qih7cbrmd ewpltiayq tr0ub4d0r 16+3=19 3 = common substitutions in the comic
troubador1 16+3.3=19.3 log2(10) = 3.3
a A 52 7.7 jAwwBYne NeTvgcrq Troubador 16+1=17 1 = caps? in the comic
a & 58 7.5 j.h?nv), c/~/fg\: troubador& 16+4=20 4 = punctuation in the comic
a A 9 62 7.3 cDe8CgAf RONygLMi Tr0ub4d0r 16+1+3=20 1 = caps?; 3 = common substitutions
a 9 & 68 7.2 _@~"#^.2 un$l|!f] tr0ub4d0r& 16+3+4=23 3 = common substitutions; 4 = punctuation
a A 9 & 94 6.7 Re-:aRo ^$rV{3? Tr0ub4d0r& 16+1+3+4=24 1 = caps?; 3 = common substitutions; 4 = punctuation
common words 2048 4 reasonable​retail​sometimes​possibly constant​yield​specify​priority reasonable​retail​sometimes​possibly 11×4=44 Go to random.org and select 4 random integers between 1 and 2048; then go to your list of common words
correct​horse​battery​staple 0 Because of this comic, this password has no entropy
a = lowercase letters
A = uppercase letters
9 = digits
& = the 32 special characters in an American keyboard; Randall assumes only the 16 most common characters are used in practice (4 bits)
(*) The use of random.org explains why jAwwBYne has two consecutive w's, why Re-:aRo has two R's, why _@~"#^.2 has no letters, why ewpltiayq has no numbers, why "constant yield" is part of a password, etc. A human would have attempted at passwords that looked random.

People who don't understand information theory and security[edit | edit source]

The title text likely refers to the fact that this comic could cause people who understand information theory and agree with the message of the comic to get into an infuriating argument with people who do not — and disagree with the comic.

If you're confused, don't worry; you're in good company; even security "experts" don't understand the comic:

  • Bruce Schneier thinks that dictionary attacks make this method "obsolete", despite the comic assuming perfect knowledge of the user's dictionary from the get-go. He advocates his own low-entropy "first letters of common plain English phrases" method instead: Schneier original article and rebuttals: 1 2 3 4 5 6
  • Steve Gibson basically gets it, but calculates entropy incorrectly in order to promote his own method and upper-bound password-checking tool: Steve Gibson Security Now transcript and rebuttal
  • Computer security consultant Mark Burnett almost understands the comic, but then advocates adding numerals and other crud to make passphrases less memorable, missing the point: Analyzing the XKCD Passphrase Comic
  • Ken Grady incorrectly thinks that user-selected sentences like "I have really bright children" have the same entropy as randomly-selected words: Is Your Password Policy Stupid?
  • Diogo Mónica doesn't understand that the words have to be truly random, not user-selected, like "let me in facebook": Password Security: Why the horse battery staple is not correct

Sigh.

Transcript[edit | edit source]

The comic illustrates the relative strength of passwords assuming basic knowledge of the system used to generate them.
A set of boxes is used to indicate how many bits of entropy a section of the password provides.
The comic is laid out with 6 panels arranged in a 3x2 grid.
On each row, the first panel explains the breakdown of a password, the second panel shows how long it would take for a computer to guess, and the third panel provides an example scene showing someone trying to remember the password.
[The password "Tr0ub4dor&3" is shown in the center of the panel. A line from each annotation indicates the word section the comment applies to.]
Uncommon (non-gibberish) base word
[Highlighting the base word - 16 bits of entropy.]
Caps?
[Highlighting the first letter - 1 bit of entropy.]
Common Substitutions
[Highlighting the letters 'a' (substituted by '4') and both 'o's (the first of which is substituted by '0') - 3 bits of entropy.]
Punctuation
[Highlighting the symbol appended to the word - 4 bits of entropy.]
Numeral
[Highlighting the number appended to the word - 3 bits of entropy.]
Order unknown
[Highlighting the appended characters - 1 bit of entropy.]
(You can add a few more bits to account for the fact that this is only one of a few common formats.)
~28 bits of entropy
228 = 3 days at 1000 guesses sec
(Plausible attack on a weak remote web service. Yes, cracking a stolen hash is faster, but it's not what the average user should worry about.)
Difficulty to guess: Easy.
[Cueball stands scratching his head trying to remember the password.]
Cueball: Was it trombone? No, Troubador. And one of the O's was a zero?
Cueball: And there was some symbol...
Difficulty to remember: Hard.
[The passphrase "correct horse battery staple" is shown in the center of the panel.]
Four random common words {Each word has 11 bits of entropy.}
~44 bits of entropy
244 = 550 years at 1000 guesses sec
Difficulty to guess: Hard.
[Cueball is thinking, in his thought bubble a horse is standing to one side talking to an off-screen observer. An arrow points to a staple attached to the side of a battery.]
Horse: That's a battery staple.
Observer: Correct!
Difficulty to remember: You've already memorized it
Through 20 years of effort, we've successfully trained everyone to use passwords that are hard for humans to remember, but easy for computers to guess.

External links[edit | edit source]

Template:Comic discussion

explain xkcd logo This article uses material from the explain xkcd article 936: Password Strength, which is released under the Creative Commons Attribution-ShareAlike 3.0 Unported License (view authors).