Hot on the heels of my dives into authenticated encryption and key-driven cryptographic agility, I thought I’d lay out a sketch of what I think a simpler core JOSE/JWT spec might look like. We’ll take in misuse-resistant crypto and Macaroons along the way. And cats. But first, a brief diversion into JSON.
As the name suggests, JOSE is built around JSON. JOSE headers are JSON objects and JWT claim sets are also JSON objects. While JSON is relatively simple to parse and easy to read, it is not without downsides. In particular, parsing JSON can be inefficient on constrained devices. So much so that an entire mirror set of standards to JOSE and JWT exist based around CBOR – an efficient binary format roughly based on JSON. Unfortunately these standards have significantly diverged from JOSE.
This is a shame, as I believe much could have been done to reduce the expense of using JSON on constrained devices. For example, we could restrict the syntax of JSON as used in JOSE to make it easier to parse. JOSE headers and claim sets rarely require elaborate nested structures or need to be long in length. A few simple rules could define a much simpler to parse subset, such as eliminating all whitespace between syntactic elements and limiting the maximum nesting depth of objects and arrays (say to 5 levels, although even 3 would probably be enough). For example, the grammar could say:
- Rank 0 objects and arrays are only allowed to contain strings, numbers, true, false or null.
- Rank 1 objects and arrays are allowed to contain strings, numbers, true, false, null and Rank 0 objects or arrays.
- Rank 2 objects and arrays are allowed to contain strings, numbers, true, false, null, and Rank 1 objects or arrays.
- And so on up to the desired maximum nesting depth.
Such simple restrictions would ensure that the syntax forms a regular language, that can be parsed with a regular expression or finite state machine – without a stack and without back-tracking – using only a constant amount of additional memory.
Of course, this wouldn’t improve some of the other downsides with JSON, such as its inefficiency in representing binary data, but it might be sufficient to at least let the header and basic structure of a JOSE object be the same on all platforms, rather than having two divergent standards. Anyway, back to the main show. In this proposal I will assume all JSON content is represented as Regular JSON.
Level 1: Symmetric authenticated tokens
For concreteness in the following discussion, I will call my hypothetical new JOSE-like objects CATs – Cloud API Tokens. Everyone loves cats. And clouds. And cloud cats.
The basic form of a CAT is a chain of blocks of data, serialised into a single string by some encoding method. In this blog I will talk just in terms of the JWE/JWS compact serialisation, where each blob is base64url-encoded and then separated by dots. Other encodings are possible, but we will use the compact serialisation for concreteness.
A CAT always starts with a unique identifier (of at least 64 bits) and a header section, and are then followed by one or more payload sections and then finally an authentication tag:
There are numerous reasons for wanting more than one body section, and we will discuss some as we go. One inspiration for this choice is the Macaroons paper that describes how an authorization token can be attenuated after it has been issued by allowing the holder to append (but not remove) additional caveats that restrict the conditions under which the token can be used. This is an incredibly powerful idea, based on a simple mechanism of chained HMAC tags.
We adopt the same approach for authenticating CATs, so that we can support an arbitrary number of body sections and also to allow Macaroon-style caveats to be appended after a token has been constructed. The process for authenticating a CAT given an initial key, k, and a suitable MAC algorithm, is as follows:
- Compute t0 = MAC(k, n) where n is the number of initial elements as a 4-byte little-endian number
- Compute t1 = MAC(t0, id)
- Compute t2 = MAC(t1, header)
- Compute t3 = MAC(t2, body)
- Compute t4 = MAC(t3, body)
- Compute tn = MAC(t(n-1), body[last])
- Output base64url(id).base64url(header).base64url(body). … .base64url(tn)
Each intermediate tag is used as the key for the next block and then discarded. To verify a CAT you simply repeat the same process and check if the final tags are equal (in constant time). In this way, an almost unlimited number of blocks could be appended and no complex encoding scheme is needed to disambiguate each block (note that we authenticate the raw bytes of each block, and only Base-64 encode afterwards). Macaroon-style caveats can be supported too, at the discretion of the application.
Update: the original construction was trivially vulnerable to length extension attacks. I’ve updated it to encode the initial number of blocks into the MAC, which prevents this. You could still append caveats after the fact, but the number of initial trusted blocks is now encoded into the MAC computation, preventing confusion between trusted claims and caveats.
I would support just two MAC algorithms initially: “HS256” is HMAC-SHA256 as in JOSE, and “BS256” would be keyed Blake2s-256.
Layer 2: Symmetric Authenticated Encryption
Those of you who have read my posts on authenticated encryption might wonder why I didn’t start with that. In actual fact, I have done, but let’s just see how encryption is added to the MAC base layer before I reveal the trick. Just a single block can be encrypted, and it must be the last block in the CAT. No further blocks can be appended after an encrypted block, which can be used to prohibit further caveats being appended. The method for appending an encrypted block is as follows:
- Calculate an authentication tag over the plain text of the block as if it was an ordinary block. For example, if the previous authentication tag is T, then we compute T’ = MAC(T, plaintext) to derive the new authentication tag.
- Encrypt the plain text using a stream cipher with an independent key, k2, using the newly computed tag (truncated if necessary) as the nonce/IV: ciphertext = encrypt(key=k2, iv=T’, plaintext).
- Output base64url(ciphertext).base64url(T’), replacing the old tag T.
This scheme is deceptively simple, but is actually a variation on the SIV (Synthetic IV) encryption mode that achieves misuse-resistant authenticated encryption (MRAE). We have simply substituted the chained-MAC construction for the S2V construction used in SIV. (WARNING: this design has not been reviewed by any cryptographers; seek expert advice before deploying any novel crypto design in production, don’t take my word for it).
The presence of the mandatory unique ID field in the token should ensure that the scheme achieves semantic security, as it acts like a nonce. If an ID is accidentally reused then the MRAE properties of the scheme minimise the security loss in that case. In particular, authentication is not compromised at all.
To verify, the recipient decrypts the ciphertext using the authentication tag as the SIV (nonce), then verifies the authentication tag as for an unencrypted CAT. We can see here why appending a new block after the encrypted block is not allowed: doing so would remove the SIV and therefore prevent decryption of the ciphertext. We will see later why this can be quite a useful property.
The two stream ciphers I would support are AES in CTR mode with a 256-bit key, and XChaCha20, with the labels “A256SIV” and “XC20SIV” respectively.
Layer 3: Key Derivation
The final layer in our cake is how the keys for encryption and MACs get derived from some initial key material. All of the algorithms we have listed so far require a 256-bit key. If you append an encrypted block, then you will need two independent keys – one for the MAC and one for the cipher, for a total of 512-bits of key material.
In principle, any of the JWE key management algorithms could be used with a CAT, but as I described in my blog series, I want all algorithms to support authenticated encryption, which rules out all of the existing public key modes. As it happens, I don’t much like the symmetric modes in JOSE either, so we will consider two new modes, both based on key derivation:
The first algorithm uses the HMAC-based Key Derivation Function (HKDF). A 256-bit is expanded into separate MAC and encryption keys using the HKDF-Expand function, with an output size (L) of 32 or 64 octets depending on whether encryption is used. The first 32 octets are used as the MAC key, and the remaining as the encryption key. If a “typ” (type) header is present, then this is fed into HKDF as the “info” argument. This provides a measure of domain separation, allowing the same master key to be used for different types of token while deriving unique keys for each type, helping to strengthen explicit typing protections.
The same hash function used for the MAC is used to instantiate HKDF, i.e., HKDF-HMAC-SHA256 or HKDF-HMAC-Blake2s-256.
This method uses something like the Noise K one-way pattern to derive keys from the recipient’s public key and the sender’s own long-term key-pair. It provides public key authenticated encryption. A rough sketch of the approach is as follows:
- Generate a fresh ephemeral key pair (es, ep) where es is the secret key and ep is the public key. Set the public key as the “epk” claim in the token header.
- Perform a Diffie-Hellman key agreement between the ephemeral secret key and the recipient’s public key – DH(es, rp). Feed the resulting shared secret into HKDF-Extract using a hash of all 3 public keys as the salt.
- Perform a second Diffie-Hellman between the sender’s secret key and the recipient’s public key, DH(ss, rp) and feed that into HKDF-Extract using the result of step 2 as the salt.
- Now derive the keys as for the HKDF method above, using the result of step 3 as the master key.
(This isn’t exactly the Noise K pattern, so should probably be called something different).
If the recipient then responds with another NK token, but using the ephemeral key it just received rather than the originator’s public key, then we end up with something similar to the Noise KK two-way pattern, which provides much stronger security properties including resistance to key compromise impersonation (KCI) and forward secrecy.
Update: Looking again at NIST’s guidance on Diffie-Hellman key agreement schemes, they describe a similar pattern in Section 18.104.22.168 – (Cofactor) One-Pass Unified Model, C(1e, 2s, ECC CDH) Scheme. The compute both DH shared secrets and then pass the concatenation of them both into the KDF.
Both key derivation algorithms provide authenticated encryption. Both mix-in the type header and so ensure different keys are derived for different purposes (assuming you use explicit typing), providing some security against mix-up attacks. Finally, neither KDF includes any message-specific details in the derivation (apart from the message type), so the same keys will always be derived. This allows a client to cache the derived keys and reuse them for multiple messages, reducing the overhead of public key cryptography when many messages must be sent to the same recipient.
CAT headers are represented as a Regular JSON object, similarly to JOSE. However, all algorithm-related headers are removed (“alg”, “enc”, “zip” etc). Furthermore the X.509-related headers are removed as are all of the JWK-related headers. Only “kid” remains to identify the key, as per my previous advice.
Keys are represented similarly to JWK, with the following changes:
- Secret key material is never mixed with public key material in a single JSON object
- Every key must include the following properties: “kdf”, “mac” and (if using encryption) “enc”. These define the precise algorithm that is to be used with that key.
- No RSA algorithms are defined, so RSA keys are not required.
A key can be represented as a CAT by including all the public claims as one body JSON object and then including the secret key material in an encrypted block.
What can you do with CATs?
CATs as imagined here could be used for many of the same things that JWTs and JOSE are used for now. In many cases a CAT should be more compact than the equivalent JWT as public key authenticated encryption allows us to use a single CAT where JOSE would require a nested signed-then-encrypted structure. CATs would therefore suitable as a representation for identity assertions like ID tokens and as authorization tokens such as OAuth access tokens.
The Macaroon-like construction of CATs allows complex delegation patterns to be encoded and enforced. This also enables CATs to be treated as bearer tokens, but then attenuated before being sent to a resource server, effectively acting like a proof-of-possession token. For example, if we had a bearer access token represented as a CAT and wanted to make a request to a HTTP endpoint using it, we could perform the following steps before sending it:
- Append a caveat setting the expiry time to a few seconds in the future.
- Append an encrypted caveat containing a hash of the request itself (or actually just put the full request in there).
This means that the original bearer token is not exposed in the request but only this much restricted one. The append-only nature of the CAT block prevents the new restrictions being removed, so if it is intercepted all an attacker could do was replay that exact request for a few seconds. The encrypted payload finalises the token, preventing any further extensions.
The main functionality missing from CATs is the ability to sign messages when you want stronger properties such as non-repudiation or 3rd-party verifiability. In most cases, authentication should be sufficient, but in cases where you need stronger properties then nested signed objects could be used – just nest a JWS inside a CAT.
This post is a flight of imagination to see what a redesigned JOSE (and particulary JWE) could look like. This is a strawman proposal to generate ideas, rather than a fully fledged alternative to JOSE. Ideally some of these ideas could be incorporated into JOSE itself. The Macaroon structure might be difficult, but there is no reason why JOSE couldn’t adopt some of the other measures:
- HKDF as a key management algorithm, with type-specific derivation
- Something like the Noise K key agreement pattern for public key authenticated encryption
- SIV encryption modes for misuse-resistant cryptography
- Key-driven cryptographic agility, and deprecation of algorithm-related headers.
Hopefully this has inspired some ideas of your own about how JOSE could be improved. I’d love to hear about them in the comments.