When working with Message Authentication Codes (MACs), you often need to authenticate not just a single string, but multiple fields of data. For example, when creating an authenticated encryption mode by composing a cipher and a MAC (like AES-CBC and HMAC), you need to ensure the MAC covers the IV, associated data, and the ciphertext. But you can’t just throw these three fields into the MAC one after the other and hope for the best, as this often leads to attacks due to ambiguity of where one field ends and the next begins.
One way to solve this is to encode the fields into a single string that is unambiguously formatted. The excellent blog post I just linked to describes how to do that using the encoding defined by PASETO:
tag = hmac(key, encode(iv, associated_data, ciphertext))
In this blog post I’ll describe an alternative approach in which we adjust the MAC interface to natively accept multiple input arguments, so that the API ensures they are processed unambiguously. This new API not only better reflects how MACs are used in practice, making it harder to misuse, but can also result in better efficiency.
SIV mode and the S2V construction
The Synthetic IV (SIV) mode of operation is widely known among cryptographers because it introduced the notion of Misuse-Resistant Authenticated Encryption, which I’m not going to talk about in this post. A little-mentioned part of the spec though is the S2V construction, which converts a MAC that takes a single input (i.e., pretty much all of them) into one that takes a vector of inputs, which is exactly what we want to be able to pass multiple arguments to our MAC safely. When applied to AES-CMAC used in SIV, you can pass up to 127 arguments to the MAC and it will automatically ensure that they are unambiguously encoded.
This means that rather than having to manually encode the IV, associated data, and ciphertext, you can simply pass them as individual arguments:
tag = S2V(key, iv, associated_data, ciphertext)
This is not only more convenient and less error-prone than manually encoding the data first, but it is also more efficient. Any unambiguous encoding scheme naturally results in some expansion of the input data that has to be fed to the MAC (e.g., adding length prefixes or delimiters), whereas S2V instead does some arithmetic in GF(2128) to adjust the MAC tag to ensure correctness. This is very fast and doesn’t require any expansion in the size of input passed to the underlying MAC. Several years ago I wrote a draft explaining how to adapt SIV mode, and S2V, to different MACs and ciphers. However, it failed to generate enough interest to take it further.
Other approaches to multi-input MACs
The S2V approach is clever, but can be a little hard to explain to developers if they’re not already familiar with finite field arithmetic, and there are some sharp edges around constant-time implementation.
Another approach to converting a single-input MAC into a multiple-input MAC comes from Macaroons, a topic I have blogged about several times before. The Macaroons construction allows caveats to be appended to a HMAC-authenticated token after it has been created by using the old MAC tag as the key to authenticate the new caveat. This is secure if the MAC is a strong pseudorandom function (PRF), as HMAC is, and if the space of tags produced by the MAC is a subset of the space of keys. We can use this approach to construct a MAC that takes multiple inputs, as follows:
function cascade_mac(key, inputs...): assert inputs.length > 0 var tag = key for input in inputs: tag = mac(tag, input) // The old tag is used as the key end return tag end
This can be compactly (and beautifully) expressed as a left-fold operation (reduce) in functional programming:
function cascade_mac(key, inputs...) = fold_left(mac, key, inputs)
This approach is quite pleasing to me, and quite simple to explain. But it has a fatal flaw: it is vulnerable to trivial length-extension attacks: anyone that sees the tag can immediately use it as a key to calculate a new tag for a message with an extra input tacked on to the end (thanks to commentators on the CFRG mailing list for clarifying this point). We can fix this by “encrypting” the final tag by passing it through the PRF again with a different key:
function safe_cascade_mac(key1, key2, inputs...): assert inputs.length > 0 var tag = key1 for input in inputs: tag = mac(tag, input) // The old tag is used as the key end return mac(key2, tag) end
This prevents any further extension of the MAC input. With this correction, the approach is secure and quite efficient, much like S2V. It is only efficient, however, if the underlying MAC does not have an expensive key initialisation process—so it would not be at all suitable for use with AES-based PRFs like that used in the original SIV construction.
Relationship to variable-length PRFs
The multi-input MAC approach based on Macaroons bears a striking resemblance to the cascade construction for converting a fixed-length-input PRF into a variable-length-input PRF described in section 6.4.2 of the Boneh & Shoup textbook. In fact, it is identical, except that in this case our original PRF is already variable-length-input, and we instead want to convert it from a single input PRF to a multiple-input PRF.
Just as with the original cascade construction, in our case it yields an algorithm that suffers from length-extension attacks. Re-encrypting the final tag with a second key is essentially applying the NMAC construction (which is what HMAC is based on), which is the standard way to fix the cascade construction to be secure against length-extension attacks.
The fact that we can re-use these basic constructions to adapt a MAC that takes a single (variable-length) input into one that takes multiple inputs is reassuring, and it gives me confidence that the approach is sound. Indeed, with a bit of thought this seems obvious in retrospect.
As well as the cascade construction and NMAC, the Boneh/Shoup book also discusses using CBC mode to convert a fixed-input PRF (block cipher) into a variable-input PRF, as used in CBC-MAC. And then how to re-encrypt the final output again to create ECBC-MAC, which is once again secure against length extension. I wonder if CBC mode could also be effectively applied to creating a secure multiple-input MAC? It seems likely that it could.
The standard definition of a MAC only allows a single input. In order to accept multiple inputs safely, the user has to manually encode them to avoid ambiguity. In this post I’ve described a couple of schemes for converting a single-input MAC into one that takes multiple inputs, and related them to existing standard constructions for creating MACs in the first place. These schemes are much less error-prone and more efficient than manual encoding. If you are designing a new cryptographic library, please consider providing a multiple-input MAC as part of the basic API, and if you’re a cryptographer proposing a new MAC or AEAD mode, please think of your users and make the MAC multi-input from the start.
Thanks for reading, and as always, comments appreciated.