In Part I, I made the argument that even when using public key cryptography you almost always want *authenticated encryption*. In this second part, we’ll look at how you can actually achieve public key authenticated encryption (PKAE) from commonly available building blocks. We will concentrate only on approaches that do not require an interactive protocol. (Updated 12th January 2019 to add a description of a NIST-approved key-agreement mode that achieves PKAE).

authenticationnoun

:an act, process, or method of showing something (such as an identity, a piece of art, or a financial transaction) to be real, true, or genuine

First, a brief digression about the nature of authentication and trust models. What do we mean when we talk about authentication? We often talk about authentication as if it was synonymous with identification, but the two are not identical. When I see an old friend in the street, I know who they are immediately (identification). I do not need to ask to see some proof of identity (authentication).

Authentication is the process of verifying that some *claims* are genuine. When you log in to a website, you claim to be somebody (your username) and you provide credentials (a password) to back up that claim – literally, to lend *credence* to your claim.

When we talk about authenticated encryption, we are interested in authenticating the *content* of the message received, regardless of whether it makes any claims about *who* sent it. Should I *trust* this message? Does it come from a trustworthy source? On first glance it may seem that in order to evaluate the trustworthiness of the source I need to know who the sender is, but this is not actually true.

In symmetric cryptography, all legitimate parties in a conversation share a secret key. Any individual with access to that key can create a message that is indistinguishable from a message made by any other individual in the group. The trust model is implicitly that anyone who has access to the key is trusted, otherwise don’t give them the key. (Typically we therefore keep the number of parties that have access to any given key as small as possible, preferably just one or two). Authentication in this case just means ensuring that a message came from *anybody *in the trusted group, but didn’t come from somebody outside the group.

In public key cryptography, everyone has their own keys, and so the space of possible trust models becomes much richer. We *could* try to mimic the symmetric situation, for instance by only giving our *public* key to known trusted parties. However, the benefits of public key cryptography are rather lost in this scenario, and the security proofs for public key cryptosystems tend not to consider the secrecy of public keys to be a concern, for obvious reasons. So your “secret public” key might not remain secret for long.

In some cases, we may want to explicitly identify the sender of all messages. Perhaps we want to be able to hold them accountable for their actions, or be able to prove to a 3rd party (such as a judge or jury) who said what. In other cases, we may prefer that this not be possible – we want to know who we are talking to at the time, but would prefer that nobody be able to prove who said what afterwards. Digital signatures are often used in the first case, while the Signal messenger takes the latter approach. It is therefore important to remember that in PKAE there is not a single trust model or single definition of authentication in play, but rather a spectrum of possibilities. In all cases, we are trying to answer the basic question *should I trust this message?*

## Option 1: Combining encryption with digital signatures

Perhaps the most obvious approach is to combine a public key encryption scheme for confidentiality with a digital signature scheme. For instance, we could encrypt a message using RSA–OAEP and then sign it with an RSA signature. But should we sign the message first and then encrypt it, or perhaps encrypt it first and then sign the result? By analogy with the symmetric case, where Encrypt-then-MAC was the right choice, we might think that Encrypt-then-Sign would work. However, this is usually not the right choice for a number of reasons:

- Firstly, if you are using signatures for legal reasons (to use as evidence) then in some jurisdictions a signature over an encrypted ciphertext may not be acceptable.
- Secondly, somebody else can simply
*remove*the signature and then add their own to claim that they sent the message. For instance, imagine a competition where the first person to solve a puzzle and send you the solution wins a prize. If you use Encrypt-then-Sign then an attacker could intercept a legitimate submission and change the signature to claim that it came from them, without ever knowing what the correct answer was! - The
*recipient*of a message may be able to find (calculate) a different message and public key pair that would encrypt to the same ciphertext that you signed. The could then claim that you actually sent them this different message rather than the genuine one.

In fact, no naïve combination of encryption and signatures achieves PKAE in general, as shown in this paper from 2001. They claim (page 6) that even if the cipher is IND-CCA secure, the generic composition of Encrypt-then-Sign *may fail to be IND-CCA secure* itself (although the definition of IND-CCA is slightly different for PKAE than it is for normal PK encryption). They offer a secure combination that they call *ESSR – Encrypt Sender-key then Sign Receiver-key*:

- First encrypt the message using the receiver’s public key, but include your own (sender’s) public key in the encrypted message.
- Next, sign the encrypted ciphertext plus the intended receiver’s public key.

This has the property of *binding* the key-pairs used in the construction of the message. An attacker that strips the signature and replaces it with their own will fail because the wrong sender key will be found inside the encrypted ciphertext. Binding the receiver’s public key to the signature prevents the malicious recipient attack mentioned in point 3 above, as they are not able to change their public key after the fact. This latter property is known as *receiver unforgeability *(RUF) and is a relatively strong property related to notions of non-repudiation.

Including the full public keys of both parties would bulk the message somewhat. RSA public keys in particular tend to be quite big, and some post-quantum cryptosystems have even larger public keys. First, note that the recipient’s public key doesn’t actually have to be sent (we can assume the recipient has it), it just has to be included in the signature calculation. Secondly, we can replace the sender’s public key with a secure hash of it in step 1 to reduce the size.

## Option 2: Authenticated Encryption from Diffie-Hellman

While achieving PKAE from separate encryption and signature primitives turns out to be surprisingly difficult, constructing a PKAE from a Diffie-Hellman (DH) key agreement turns out to be surprisingly easy. For instance, if we assume that both parties have known long-term DH key pairs, then we can achieve PKAE by this simple scheme:

- The sender calculates the shared secret using DH between its private key and the recipient’s public key. It then uses this to derive a symmetric authenticated encryption key using a suitable KDF.
- The sender encrypts and authenticates their message using a normal symmetric authenticated encryption scheme (e.g., AES-GCM) and a fresh nonce.
- The recipient repeats step 1 to derive the same symmetric key and uses that to decrypt the message.

This is quite likely what you would first try if you only knew about Diffie-Hellman and symmetric authenticated encryption. In fact, it was the first thing I *did* try when I first learned about Diffie-Hellman. This is also pretty much exactly what NaCl’s crypto_box function does – first it derives a secret key using the X25519 elliptic curve Diffie-Hellman function, and then it encrypts the message using the XSalsa20-Poly1305 authenticated stream cipher. The paper that describes ESSR also describes a variant of this scheme that it calls *DHETM – Diffie-Hellman Encrypt-then-MAC.*

However, there are some potential drawbacks to this simple scheme:

- It does not achieve the strong property of receiver unforgeability (RUF) as described above — the recipient of a message can construct arbitrary forgeries as they know the same symmetric key.
- The
*exact same*encryption key will be derived every time the two parties communicate – Diffie-Hellman is completely deterministic. This means that you need to be very careful to use a fresh nonce each time. - It suffers from a subtle issue known as Key Compromise Impersonation (KCI): if an attacker compromises your secret key, they can not only pretend to be you when talking to anyone else, they can also pretend to be anyone else when talking to you. This follows directly from the symmetric nature of Diffie-Hellman.
- If
*either*the sender’s or receiver’s keys are compromised then all communications between them can be decrypted.

I don’t think any of these are huge problems. Item 1 is sometimes actually an *advantage* rather than a disadvantage, if you care about plausible deniability (as in Signal). RUF is also a very strong property that is often not required – as you trust your recipients. Item 2 is also not a huge issue if you use random nonces and something like XSalsa20 or AES-SIV and rotate your keys reasonably often. We will discuss below ways to improve this.

Items 3 and 4 are potentially quite serious. We would expect that if the recipient’s private key is compromised then messages sent to them can be decrypted (unless we have forward secrecy, but that generally requires an interactive protocol, which we are not considering here), but it is worrying that even if the sender’s key is compromised they can be decrypted too. Preventing KCI requires either an interactive protocol or re-introducing signatures (with the associated complexity and loss of plausible deniability). Frequent key rotation is a mitigation against these issues.

There are various ways to address items 2 and 4 without relying on the misuse-resistance of the symmetric cipher mode. But first, let’s look at one popular DH encryption scheme that *doesn’t *achieve PKAE: (EC)IES.

### A false start: the Integrated Encryption Scheme

The Integrated Encryption Scheme (IES) and its elliptic curve sibling (ECIES) is a popular way of forming a public key encryption scheme from a Diffie-Hellman primitive. For instance, it is included in the JOSE (JWT) standards as “ECDH-ES”. The basic idea is to take the Diffie-Hellman scheme we previously described, but instead of the sender using their own long-term key-pair, they generate a fresh *ephemeral* key pair and use that instead. They then send the ephemeral public key along with the message.

So long as the sender manages to generate fresh random ephemeral keys correctly, then this eliminates problem 2, and partially mitigates problem 4: a fresh symmetric encryption key is derived for each interaction, and compromise of the *sender’s* long-term keys now has no impact on the confidentiality of messages they sent to anyone else.

However, if the sender can generate fresh random keys then they can presumably also generate fresh random nonces, so this is not a huge improvement with respect to item 2 if you use a sensible symmetric encryption scheme that doesn’t fail catastrophically on nonce reuse. If you cannot guarantee generating fresh random nonces, then things are not much better if you cannot generate random keys – generating the same ephemeral key pair twice is just as catastrophic if you then use a deterministic nonce scheme (e.g. a counter), as you will then be in nonce-reuse galore! If you are serious then it is best to generate ephemeral keys *and* random nonces (*and *use a misuse-resistant AE scheme!).

On the other hand, IES gives up sender authentication entirely. The sender’s long-term keys are no longer involved in the encryption process at all. This renders IES and ECIES in the same boat as RSA encryption, and leaves you fiddling with signatures to try and plug the gap. Such a waste when we saw how easily DH can achieve PKAE.

### Noise One-Way Handshake Patterns

The Noise protocol framework by Trevor Perrin describes a number of ways of constructing secure authenticated protocols based on Diffie-Hellman primitives. While most of the patterns concern interactive protocols with strong properties such as forward-secrecy, it also describes three one-way patterns that can be used to encrypt a message to a recipient without any online interaction:

- The
**N**pattern is basically ECIES: we generate an ephemeral key-pair and then do DH with the recipients static public key. - The
**K**and**X**protocols do almost the same as ECIES but then do a*further*DH agreement between the*sender’s*long-term private key and the recipient’s long-term public key. The outputs of*both*DH agreements (ephemeral-static and static-static) are fed into the KDF to derive the message key. The only difference between them is whether they assume the recipient already knows the sender’s public key (K) or it is sent with the message (X).

Noise lists the security properties that are achieved by each pattern. For K and X, we achieve sender and recipient authentication (still with the risk of KCI), with forward secrecy against compromise of the sender’s long-term keys, but no forward secrecy if the recipient’s long-term keys are compromised. (The security properties also lists the risk of message replay, but this is true of every scheme we discuss in this post).

As can be seen by looking at the entries for other patterns, significantly better security properties can be achieved if we are willing to engage in an interactive protocol. Notice that the K one-way handshake pattern is a prefix of the **KK** two-way handshake, and likewise the X pattern is a prefix of the **IK** pattern (I think one of these must have inconsistent naming). This suggests to me that we could “upgrade” a one-way handshake to a two-way handshake if the recipient happened to be online. Both KK and IK have strong security properties including KCI-resistance and forward secrecy.

### NIST One-Pass Unified Model

NIST’s Special Publication 800-56A (NB: currently unavailable because of the US government shutdown, sigh) defines their recommendations for Diffie-Hellman key agreement schemes. It’s worth a read as it is well written and goes into a lot of detail. The JOSE specs define the ECDH-ES algorithm with reference to the One-Pass Diffie-Hellman scheme defined in section 6.2.2.2 of that document. This is essentially the classic ECIES scheme discussed earlier. However, the document also defines some methods that provide PKAE as we have discussed in this blog series. In particular, the One-Pass Unified Model defined in section 6.2.1.2. Alice uses the following steps to send a message to Bob:

- First she generates an ephemeral key-pair and adds the ephemeral public key to the message.
- Then she performs a DH key agreement between her ephemeral static key and Bob’s static public key, just as she would in ECIES.
- Then she performs
*another*DH key agreement, this time between her*static*private key and Bob’s public key. - She then concatenates the shared secrets from steps 2 and 3 into a single secret, Z (which is twice as long, of course).
- She then runs Z through some key derivation function (KDF), together with any context material (typically including all 3 public keys involved) to derive the final symmetric encryption key.
- She can then destroy Z, the shared secrets from steps 2 & 3, and the ephemeral private key from step 1.
- Finally she encrypts her message using the key from step 5 using some symmetric authenticated encryption scheme.

On receiving the message, Bob can reconstruct Z from his own static private key, along with Alice’s static public key and the ephemeral public key she sent in the message. As described, this scheme is broadly the same as the Noise K pattern described above. The main difference being that Noise carefully describes exactly what goes into the context used in step 5 above by building up a hash of important parts of the state as it goes. Noise also provides a protocol view, with the full lifecycle of an ongoing interaction between two parties being considered, while the NIST specification is just concerned with a one-time key agreement process.

The NIST document also describes schemes in which both parties exchange ephemeral keys, similar to Noise interactive patterns, such as the Full Unified Model in section 6.1.1.2.

## Summary

In this part we have examined some ways to try and achieve public key authenticated encryption (PKAE). We’ve seen how difficult it is to achieve PKAE based on a generic composition of public key encryption and digital signatures. We’ve seen two generic approaches to achieving PKAE – one (ESSR) based on Encrypt-then-Sign with binding of public keys, and another based on static Diffie-Hellman key agreement. We then discussed some Diffie-Hellman variants that achieve stronger properties while still being relatively simple.

In the next part we will reflect on our findings and discuss improvements that could be incorporated into popular standards like JOSE and JWT.