One of the criticisms of the JOSE/JWT standards is that they give an attacker too much flexibility, by allowing them to specify how a message should be processed. In particular, the standard “alg” header tells the recipient what cryptographic algorithm was used to sign or encrypt the contents. Letting the attacker chose the algorithm can lead to disastrous security vulnerabilities.
In response to this, some people have suggested alternatives such as removing the alg header and instead only allowing a version and a type from a very limited set. While this seems better, it is actually just a variation on the same mistake. If an attacker can still pick the version then they can still potentially cause mischief.
My opinion is that it is better to remove any indicators from the message about how it should be processed – no “alg”, no purpose indicator, no version numbers. Instead the message should (optionally) allow specifying just one hint: a key id (“kid” header in JOSE). The metadata associated with the key should then tell you how to process messages with that key. For instance, the related JWK spec already allows specifying the algorithm that the key is to be used for. The recipient looks up the key and processes the message according to the metadata.
If you want to change the algorithm then you simply publish a new key with the new algorithm. It is best practice to use a new key when switching algorithms anyway. Even in the case where the old key would be compatible with the new algorithm, security proofs often assume that a key is only used for one algorithm. If you really must use the same key with more than one algorithm, then you could just publish more than one JWK, referencing the same key but with different metadata.
(Update added 26th April 2021): In the context of a protocol like TLS, where parties do not necessarily know each others’ key identifiers ahead of time you can imagine baking the algorithm details into the certificate. For example, suppose that a TLS ciphersuite identifier was baked directly into each certificate. The client says what ciphersuites it supports, the server then responds with a cert that matches one of those ciphersuites and they use the exact ciphersuite in the cert. CAs can then perform useful functions like ensuring deprecated ciphersuites aren’t used in new certificates, or that the same key is not used for multiple ciphersuites in potentially dangerous combinations.
A digression on capability security
I have always found an object-capability (ocap) security analysis to be illuminating, even when not designing an ocap system. For instance, consider these two ways of copying a file on a UNIX system (I can’t remember which paper I read this example in, but it is not my own):
cp from to
cat <from >to
The first (cp) takes an input filename and an output filename and opens the two files for you. The second (cat) instead takes an already opened pair of file descriptors and reads from one and writes to the other. If we think about the permissions that these programs need, the first needs to be able to read and write to any files that you might ask it to, while the second only needs to read and write to the specific file descriptors that you passed it as arguments.
The first approach is much more susceptible to a class of security vulnerabilities known as confused deputy attacks. The original example was a compiler service that accepted programs to compile and an output file to write the compiled object to. By passing in a carefully chosen output file (say /etc/passwd) you can trick the compiler into performing malicious actions that you would not be able to perform by yourself (overwriting sensitive files). Confused deputy attacks are widespread. Both path traversal and CSRF attacks are examples.
By bundling up the permissions to perform an operation with an unforgeable reference to an object to perform those actions on, a capability-based system makes it impossible to even express confused deputy attacks: you cannot forge a writeable reference to a file that you do not yourself have write access to. The same principle is used in CSRF countermeasures (which are in fact largely capability-based) – the anti-CSRF token prevents an attacker forging a request. The ocap literature shows how in this model, secure design closely follows normal best practice software design principles: encapsulation, dependency injection, eliminating global variables and static methods with side-effects, and so on.
So how does this relate to JOSE? By allowing the sender of a message to indicate a key to be used separately from specifying the algorithm to use with that key, JOSE is making exactly the same mistake the UNIX does and opening up implementations to confused deputy attacks. Here the JOSE/JWT library is the deputy and it can become confused about what algorithm to use with what key, just as a compiler can become confused about what files it should be writing to. If we remove the ability for an attacker to specify a key independently of specifying the algorithm, then we eliminate this whole class of attacks.
If we further only allow specifying the key by id, then the key id (“kid” in JOSE) becomes a capability. That capability can either be public (a simple name) or private (an unguessable random string). In either case, an attacker must use one of our set of valid keys and cannot confuse our deputy by creating new keys or specifying incompatible algorithms.