It's really sad that the state of the art from 25 years ago is still ahead of common practice today and you can be lucky to find it in systems today.
Most systems still use general purpose cryptography hash functions for password storage. Those are explicitly designed with designed goals that go exactly contrary to the goals of password storage. For general purpose cryptographic hash functions being fast and very low in resource consumption and requirements is a feature. That is the exact opposite of what you want for password hashing. You want something that is much slower to have a higher work factor for an attacker and requires a significant amount of memory with pseudo random access for computation so you cannot easily implement it cheaply in hardware and cannot trivially parallelize many operations on the same piece of hardware.
PBKDF2 wich simply runs the same general purpose hash function for multiple rounds on the salted password only helps with the work factor but fails in everything else. Yet it's considered an acceptable password hashing function and probably even above average security.
We should be well beyond the usage of password for authentication and only using public key based systems for remote authentication. Instead we're using an extremely poor authentication mechanism for humans – one which is a security anti feature and the root of a huge portion of all successful attacks – and we're not even at the point of securing its secrets with long outdated security from the last millennium.
Please just use argon2 or yescrypt or scrypt. They are actually made for password hashing and use modern cryptography not from the year when Matrix was released.
I don't find this too concerning but libraries should document this and maybe even raise exceptions on data longer than 72 bytes, failing silently is the worst behaviour.
Does this really need yet another blog post? 72 characters is more than enough to be resistant to brute-force attacks, as demonstrated by thousands of data breaches containing bcrypt hashes that remain uncracked (excluding the obvious top 1k passwords/ credential stuffing). In my personal opinion calling it "unsafe" is just fear mongering, especially in conjunction with a recommendation of using Argon2 which is comparatively very new and is probably safe - but once again, does not have the proven record that bcrypt does.
I agree 72 characters is plenty for most circumstances. However, as the blog points out, this is a byte limit not a character limit.
Some of the family emoji can be > 20 bytes. Some of the profession emoji can be > 17 bytes. If people are using emoji in their passwords, we could quite quickly run out of bytes.
I think it’s a limitation worth being aware of, even if “unsafe” is perhaps overstating it.
I still don't see how that's an issue, yes a password using a series of ridiculously complicated family emoji will be truncated but the actual bytes still provide entropy, just because the data doesn't use pixels when rendered doesn't mean it doesn't increase the search space
If your password is comprised of three emojis that each take up 24 bytes, then yes, a 72 byte truncation dramatically reduces the search space for a brute force against these hypothetical 24-byte-emoji-only passwords.
There are far fewer possible combinations of any three emojis than there are any 72 ASCII characters.
This is x^3 vs y^72, where X is the total number of distinct emojs and Y is the total number of distinct ASCII characters.
24 bytes of data is not 24 bytes of entropy if there are only a couple thousand different possible inputs to produce all of the possible 24 byte sequences produced by those inputs.
For simplicity: picture having only two possible input buttons. Each one produces 1000 bytes of random-looking data, but each one always produces the exact same 1000-byte sequence, respectively. You have a maximum password of 1 button press. The "password" may contain 1000 bytes, but you only have one bit of entropy, because the attacker doesn't need to correctly guess all 1000 bytes, they only need to correctly guess which of the two buttons you pressed.
Of course, in practice, not all emojis are 24 bytes, and I'd assume few people are using emoji-only passwords, but the distinction between bytes of data and bytes of entropy is worth clarifying, regardless.
I would argue that a password containing emojis is unlikely to ever be cracked, because no attacker is going to test emojis unless they have some reason to believe you use them in your password.
Attackers don't come up with every entry on the wordlist they throw into hashcat themselves. The attacker's imagination has essentially zero correlation with the contents of their wordlist.
Rest assured, the world's intelligence agencies and cybercrime rings aren't just taking vanilla open source wordlists off github and hoping they get lucky.
You don't know what your adversary's wordlist contains, and assuming you do is a recipe for overconfidence.
The hash is 24 bytes. Even without an input character limit, you're likely to find tons of valid aliases for your 1000-character password within the 72-byte password space.
I never actually considered it until I read parent, and now I'm gonna try to start using it wherever it's supported, it's genius to use it for passwords as long as it's supported by the platform. Edit: Just to clarify, together with a password manager of course, otherwise I'd never have the patience for it.
Well, if you limit the discussion to passwords, you're right, maybe no need to worry especially if using randomly generated ones (like ones from password managers), but if the algorithm is used to check some "composed" credentials (like what happened with Okta last year) then maybe it's worth worrying about, no ?
*used to be unsafe
just to note, other implementations have the same design (silent and truncate), I've recently found out that htpasswd from Apache HTTP server has the same silent behavior
bcrypt is from 1999.
It's really sad that the state of the art from 25 years ago is still ahead of common practice today and you can be lucky to find it in systems today. Most systems still use general purpose cryptography hash functions for password storage. Those are explicitly designed with designed goals that go exactly contrary to the goals of password storage. For general purpose cryptographic hash functions being fast and very low in resource consumption and requirements is a feature. That is the exact opposite of what you want for password hashing. You want something that is much slower to have a higher work factor for an attacker and requires a significant amount of memory with pseudo random access for computation so you cannot easily implement it cheaply in hardware and cannot trivially parallelize many operations on the same piece of hardware.
PBKDF2 wich simply runs the same general purpose hash function for multiple rounds on the salted password only helps with the work factor but fails in everything else. Yet it's considered an acceptable password hashing function and probably even above average security.
We should be well beyond the usage of password for authentication and only using public key based systems for remote authentication. Instead we're using an extremely poor authentication mechanism for humans – one which is a security anti feature and the root of a huge portion of all successful attacks – and we're not even at the point of securing its secrets with long outdated security from the last millennium.
Please just use argon2 or yescrypt or scrypt. They are actually made for password hashing and use modern cryptography not from the year when Matrix was released.
I don't find this too concerning but libraries should document this and maybe even raise exceptions on data longer than 72 bytes, failing silently is the worst behaviour.
Does this really need yet another blog post? 72 characters is more than enough to be resistant to brute-force attacks, as demonstrated by thousands of data breaches containing bcrypt hashes that remain uncracked (excluding the obvious top 1k passwords/ credential stuffing). In my personal opinion calling it "unsafe" is just fear mongering, especially in conjunction with a recommendation of using Argon2 which is comparatively very new and is probably safe - but once again, does not have the proven record that bcrypt does.
I agree 72 characters is plenty for most circumstances. However, as the blog points out, this is a byte limit not a character limit.
Some of the family emoji can be > 20 bytes. Some of the profession emoji can be > 17 bytes. If people are using emoji in their passwords, we could quite quickly run out of bytes.
I think it’s a limitation worth being aware of, even if “unsafe” is perhaps overstating it.
I still don't see how that's an issue, yes a password using a series of ridiculously complicated family emoji will be truncated but the actual bytes still provide entropy, just because the data doesn't use pixels when rendered doesn't mean it doesn't increase the search space
If your password is comprised of three emojis that each take up 24 bytes, then yes, a 72 byte truncation dramatically reduces the search space for a brute force against these hypothetical 24-byte-emoji-only passwords.
There are far fewer possible combinations of any three emojis than there are any 72 ASCII characters.
This is x^3 vs y^72, where X is the total number of distinct emojs and Y is the total number of distinct ASCII characters.
24 bytes of data is not 24 bytes of entropy if there are only a couple thousand different possible inputs to produce all of the possible 24 byte sequences produced by those inputs.
For simplicity: picture having only two possible input buttons. Each one produces 1000 bytes of random-looking data, but each one always produces the exact same 1000-byte sequence, respectively. You have a maximum password of 1 button press. The "password" may contain 1000 bytes, but you only have one bit of entropy, because the attacker doesn't need to correctly guess all 1000 bytes, they only need to correctly guess which of the two buttons you pressed.
Of course, in practice, not all emojis are 24 bytes, and I'd assume few people are using emoji-only passwords, but the distinction between bytes of data and bytes of entropy is worth clarifying, regardless.
I would argue that a password containing emojis is unlikely to ever be cracked, because no attacker is going to test emojis unless they have some reason to believe you use them in your password.
Attackers don't come up with every entry on the wordlist they throw into hashcat themselves. The attacker's imagination has essentially zero correlation with the contents of their wordlist.
Okay. How many major wordlists include emojis?
Maybe...like...a dozen entries at most across all of them?
Rest assured, the world's intelligence agencies and cybercrime rings aren't just taking vanilla open source wordlists off github and hoping they get lucky.
You don't know what your adversary's wordlist contains, and assuming you do is a recipe for overconfidence.
The hash is 24 bytes. Even without an input character limit, you're likely to find tons of valid aliases for your 1000-character password within the 72-byte password space.
You could always pre-hash the password with sha256 or something similar to guarantee you won't go over the 72 byte limit.
I don't understand why this isn't a mandatory first step in the bcrypt algorithm itself. Who thought that a 72 byte limit was a good idea?
Does anyone actually use emoji as a password.
I never actually considered it until I read parent, and now I'm gonna try to start using it wherever it's supported, it's genius to use it for passwords as long as it's supported by the platform. Edit: Just to clarify, together with a password manager of course, otherwise I'd never have the patience for it.
yea, me (pls dont crack)
You could ask for your password to be removed from the list: https://github.com/danielmiessler/SecLists/pull/155
Well, if you limit the discussion to passwords, you're right, maybe no need to worry especially if using randomly generated ones (like ones from password managers), but if the algorithm is used to check some "composed" credentials (like what happened with Okta last year) then maybe it's worth worrying about, no ?
Alternate title:
"Why Python's bcrypt implementation is unsafe for Password Hashing"
*used to be unsafe just to note, other implementations have the same design (silent and truncate), I've recently found out that htpasswd from Apache HTTP server has the same silent behavior