There has been a lot of discussion recently around the LastPass breach, especially with regards to the number of PBKDF2 iterations applied to the master password to derive the vault encryption key. Other people have already dissected this particular breach, but I want to more generally talk about PBKDF2 iterations and security models. (I’m not going to talk about Argon2 or Bcrypt or any other algorithms).
There are two related reasons for using a password-based key derivation function like PBKDF2. One is to protect password hashes used for login on a website. The other is to derive a cryptographic key from a password to use for encryption. LastPass were actually doing both of these things, but I want to talk about the latter case in this post: using a password as a secret to encrypt data that you want to remain private.
Let’s put passwords completely to one side for a moment and talk about cryptographic keys and security models. If you just care about confidentiality of data, the very best you can achieve is something known as perfect secrecy. If your encryption scheme achieves this definition of security then it is impossible for an attacker with access to the encrypted data to learn anything about either the key or the original message, other than its length and what they could already infer before seeing anything. This is independent of how much money they have, or what insane futuristic computational devices they have. Even the Minds from Iain M. Banks novels couldn’t crack such a code.
Surprisingly, we know how to design such a code, and it’s known as a one-time pad. The only (massive) downside is that the key must be as long as the message and only used once. That turns out to be a bit of a bummer in practice, so we relax the security model so that we can use more efficient schemes. These schemes use a fixed-size key that is much smaller than the message, such as 128 bits. The trade-off for having smaller keys is that we can no longer achieve perfect secrecy. Instead we have a computational definition of security (known as semantic security), where we only consider attackers with “reasonable” (but still vast) computational powers, and we settle for allowing them a tiny probability of success (rather than no chance at all).
For an encryption algorithm like AES, we have generally settled on 128-bit keys as a safe minimum. A 128-bit secret key means that it should take an attacker on the order of 2^128 “operations” to discover the key by brute force guessing. (Actually, we’d expect only 2^127 operations on average). What an “operation” involves will vary depending on the algorithm and other details, but we generally assume they are quite fast and so attackers can perform a lot of them. 128 bits means that finding the key is completely out of reach for an extremely powerful attacker using millions of computers that are vastly more powerful than anything we know of today. It’s overkill security, because we don’t really know what capabilities attackers of the future will possess or what as-yet-unknown weaknesses might exist in the algorithms. People disagree over where the cut-off for feasibility may lie, but most people agree that 128 bits is safely on the other side.
Now back to thinking about passwords. Many passwords chosen by users in the real-world are so weak that they already appear on well-known password lists like “123456” or “password”. It is best to consider such passwords as having no entropy at all: they will be cracked in short order. For better passwords, estimating the entropy is tricky because entropy depends on the process used to come up with the password, rather than the set of characters involved, and tends to be somewhat personal. The best we can do is estimate based on length, character set and so on. For the purposes of this discussion, let’s say a “good” password is 12 characters long, chosen from lower- and upper-case Latin alphabet, plus digits 0-9, plus a couple of special characters like ! and @ thrown in for good measure. That is essentially a 12-character Base64 value, and if chosen perfectly randomly would have about 72 bits of entropy, which is spectacularly good compared to most real passwords. According to Wikipedia, an estimate of real-world user password entropy is around 40 bits.
Now, what if we want that password to have the same difficulty to crack as a 128-bit AES key? Let’s say because we are encrypting a password vault with “military-grade” AES encryption with a key derived from that password. How many iterations of PBKDF2 do we need to use to get our 72-bit password up to 128-bit strength? If we assume that a single iteration of PBKDF2 is roughly equivalent in cost to trying a single AES key, then by increasing the iterations we effectively increase the cost to an attacker as if we had more bits of entropy in the first place. In this simplistic model, we can use 2^56 iterations of PBKDF2 to make brute-forcing our 72-bit password roughly as costly as guessing a 128-bit AES key.
2^56 is 72,057,594,037,927,936 iterations (72 quadrillion).
As far as I’m aware even OWASP don’t recommend 72 quadrillion iterations of PBKDF2. It’s obviously an absurd value. If we used a password with only 40 bits of entropy, we’d need 2^88 iterations to reach the same level, which is about the work done by the entire Bitcoin network in a year. OWASP currently recommend 310,000 iterations, which equates to about an extra “18 bits” of security in this model: for a 40-bit password, this results in about the same security as a DES key, which hasn’t been considered secure for decades.
The point of this is not to convince you to increase your PBKDF2 iterations. It’s not even to get you to switch to a better password hashing algorithm. Instead, it is to point out that there is no sane parameters for password hashing that provide anything like the security levels expected in modern cryptography. A lot of the discourse around password hashing gives the impression that there is some magic number you can pick that actually makes passwords safe to use for this kind of thing. There isn’t. They are not. Either your password has sufficient entropy to resist brute-forcing, in which case it is already probably a cryptographic key, or it doesn’t – in which case it will eventually be cracked no matter how many iterations you apply to it. The entropy of the password itself still matters much more than the hashing parameters.
All this is to say that the point of password hashing is not to prevent brute force attacks. It is instead to buy some time. By slowing down an attacker you hope that you will have enough time to (1) notice you’ve been breached, and (2) change your passwords. That’s all a password hash can ever do. If you are storing user password hashes on a server then this is crucial and you should definitely use as many iterations as your hardware can cope with (and look at things like SCRAM if it can’t cope with many). But if you’re using a password to encrypt some sensitive data that you want to keep secret for a long time, then really your only hope is to use a long and random password that resembles a secret key. And in that case you could just as well use a single iteration of PBKDF2.
Getting back to our discussion of security models, when you use a password for cryptography, you are weakening the model again. First we weakened it from perfect secrecy to computational security. This time it is weakened from a computational security model (based on ideal models of computation) to an economic one, based on extrapolating current most cost-effective cracking setups. You may be fine with relying on such a model, but you should be clear that it is a significant weakening of the standard security definitions.
Addendum: From online discussions, I think some people are wondering why we should care about this relaxation of the security model. After all, if you choose a “good enough” password and use something like Argon2 with recommended parameters, then your password is certainly going to be hard to crack. The reason I think this is important is to do with the number of assumptions you have to make in your security model. The original perfect secrecy model makes no assumptions at all about the attacker. The computational model makes a few about theoretical models of computation and probabilities. But with an economic model, we have to make lots of assumptions about availability of hardware, costs, algorithmic advancements and so on. If you need to keep data safe for 30–50 years or more (i.e., your lifetime), then it becomes really difficult to predict how things might change. We were still in the punchcard era 50 years ago. For example, many current cost estimates for password cracking assume that for a GPU it is expensive to access lots of memory. That is currently true, but doesn’t seem like a fundamental physical limitation, so may well change in the coming decades. New hardware advances are extremely likely, entirely new computational models less so. (If cryptographically-relevant quantum computers become feasible, then it is possible that 256-bit keys will become the standard, which makes PBKDF2 and similar algorithms even less relevant).