From KEMs to protocols

This is the third part of my series on Key Encapsulation Mechanisms (KEMs) and why you should care about them. Part 1 looked at what a KEM is and the KEM/DEM paradigm for constructing public key encryption schemes. Part 2 looked at cases where the basic KEM abstraction is not sufficient and showed how it can be extended to add support for multiple recipients and sender authentication. At the end of part 2, I promised to write a follow-up about tackling forward-secrecy and replay attacks in the KEM/DEM paradigm, so here it is. In this article we’ll go from simple one-way message encryption to a toy version of the Signal Protocol that provides forward secrecy and strong authentication of two (or more) parties.

WARNING: please pay attention to the word “toy” in the previous sentence. This is a blog post, not a thorough treatment of how to write a real end-to-end encrypted messaging protocol.

The problems to be solved

There were three major issues left unsolved by the previous blog posts, whichh are the following:

  • The confidentiality of messages is protected if only the sender’s long-term secret key is compromised. But if the recipient’s long-term key is compromised then all previous messages to that recipient can be decrypted. In technical terms, we’d say that the encryption lacks forward secrecy. (More specifically, it only has sender-compromise forward secrecy).
  • Messages can be captured and replayed by an attacker. Although the recipient can be assured that a message was originally authored by a trusted party (if an authenticated KEM is used), they have no guarantee that it was that same party that just sent them the message. For example, if the message authorizes the sender’s bank to transfer some money to the attacker, the attacker can capture and replay the message multiple times to get more and more money. In this case, we say that the message authentication doesn’t guarantee freshness.
  • Finally, without signing the entire message content, the encryption schemes seen so far are vulnerable to Key Compromise Impersonation (KCI) attacks: if an attacker compromises your secret key they can not only pretend to be you to other people, but can pretend to be other people when talking to you.

The first two issues are connected with time: in both cases we want the message to be connected to the time that it is created, either to ensure it cannot be replayed later or that it cannot be decrypted later. By solving these problems we will also fix the vulnerability to KCI attacks too. To do this, we’ll move from an offline message encryption scheme to an online interactive protocol.

Varieties of Diffie-Hellman worth wanting*

If you’ve got this far in the series, you probably have a good idea of what Diffie-Hellman (DH) key agreement is all about. Two parties exchange public keys and then perform a DH between their own secret key and the other party’s public key to derive a shared secret that they can then use to encrypt and authenticate messages to each other.

In its most basic form, each party uses their long-lived (“static”) keys for DH, and so always derive the same shared secret. This “static-static” DH key agreement operation provides authentication of both the sender and the recipient, but is vulnerable to replay attacks and doesn’t provide forward secrecy. If we want to ensure that a fresh key is derived every time (as required for a KEM) we can either mix a random value (nonce) into the key derivation step, or we can replace one of the static keys with a fresh temporary (“ephemeral”) key.

This is what the DHIES (Diffie-Hellman Integrated Encryption Scheme) and the elliptic curve variant (ECIES) do: the sender’s static key is replaced with an ephemeral key, producing an ephemeral-static DH agreement. The ephemeral public key is then attached to the message. Ephemeral-static key agreement has forward secrecy against sender key compromise (but not recipient key compromise), but is still vulnerable to replay attacks (unless the recipient remembers all previously seen ephemeral keys). It also lacks sender authentication because the sender’s secret key has been replaced with an ephemeral one that anyone could have generated. This is why the authenticated KEM in part 2 of this blog series used two key agreements: an ephemeral-static one to ensure fresh message keys, and a static-static one to ensure sender authentication.

To prevent replay attacks and ensure forward-secrecy against recipient key compromise, there is another variant of DH we can use: static-ephemeral DH. In this mode, the recipient generates an ephemeral key pair and sends the ephemeral public key to the sender. The sender then performs a DH key agreement between that ephemeral public key and their own static secret key and uses the resulting shared secret to encrypt and authenticate a message to the recipient. The recipient uses the ephemeral secret key and the sender’s static public key to derive the same secret and decrypt the message. The recipient is assured of the identity of the sender and is protected against replay attacks because the sender can only have derived the shared secret after seeing the ephemeral public key. If the recipient deletes the ephemeral secret key immediately after decrypting the message from the sender then this also ensures forward secrecy against recipient key compromise.

This static-ephemeral DH also eliminates KCI attacks. The ephemeral public key effectively acts as a challenge in a challenge-response protocol, ensuring that the sender is positively authenticated.

The security properties of these different variants of Diffie-Hellman, along with a pure ephemeral-ephemeral DH, are summarised in the following table:

DH variantSender authentication?Recipient authentication?Protection from replay attacks?Forward secrecy against sender compromise?Forward secrecy against recipient compromise?Protection from Key Compromise Impersonation?
static-staticYesYesNoNoNoNo
ephemeral-staticNoYesNoYesNoNo
static-ephemeralYesNoYesNoYesYes
ephemeral-ephemeralNoNoYesYesYesn/a
Security properties of different Diffie-Hellman variants

By combining different modes of DH, we can achieve much stronger security properties than a single DH can achieve on its own. If we feed the outputs of each DH into a single secure key derivation function then we can achieve the best security properties of each individual key agreement. The Noise Protocol Framework is an extended study of the security properties of different DH combinations, referred to as “handshake patterns”. In particular, we can see that handshakes that combine an ephemeral-static DH and a static-ephemeral DH achieve the strongest security properties once the handshake completes. By combining these with a static-static DH we can achieve stronger security properties also during the handshake itself (useful if you want to exchange some “early data” during the handshake to save time).

Signed prekeys

We’ve seen that interactive handshake protocols can solve our problems with replay attacks, forward secrecy, and KCI, but how does this relate to KEMs? With a KEM the sender is able to send messages to a recipient just by knowing the recipient’s static public key. So how does a (recipient-generated) ephemeral challenge come into the picture? The sender could first contact the recipient to get an ephemeral challenge, but what if the recipient is offline?

This problem is faced by secure messaging services such as Signal. The solution they came up with is to use signed prekeys. A prekey is simply an ephemeral key pair that the recipient generates ahead of time and uploads to some central server that all users have access to. For example, Bob can generate a bunch of ephemeral key pairs and upload the public key parts to the server while storing the ephemeral secret keys somewhere safe on his local device. If Alice wants to send a message to Bob she contacts the server, which sends her one of Bob’s prekeys (and then deletes it). Alice then uses her KEM to construct a message to Bob, using the prekey as if it was Bob’s static public key. To assure Alice that this prekey really did come from Bob, Bob can sign the prekeys using his static private key. Effectively each prekey is a one-time-use ephemeral certificate.

When Bob receives the message from Alice, he can use the corresponding ephemeral secret key (stored on his device) along with Alice’s public key to decrypt and authenticate the message. He then deletes that ephemeral prekey to prevent it being reused, ensuring protection against replay attacks and forward secrecy.

If Alice is using the ECDH authenticated KEM from the previous blog post, then she will effectively end up doing two key agreement steps:

  • An ephemeral-ephemeral key agreement between a fresh ephemeral secret key generated by the KEM and Bob’s ephemeral prekey.
  • A static-ephemeral key agreement between Alice’s static secret key and Bob’s ephemeral prekey.

If you look back to the table of security properties, this combination ticks all of the boxes apart from recipient authentication—which is handled by the signature on the prekey.

So signed pre-keys together with an authenticated KEM can solve our problems and make a very secure communication system. But what happens if you run out of prekeys? You could just say that Bob can’t receive any more messages until he replenishes the supply, effectively making prekeys work as a form of rate-limiting. Or you could allow Alice to fallback to a normal authenticated KEM with Bob’s static public key. Signal adopts a middle ground and has Bob publish a “semi-static” (or “semi-ephemeral” depending on your point of view) signed prekey along with a bundle of normal prekeys. Alice can then fallback to using this semi-static prekey if there are no ephemeral prekeys left and Bob will rotate this prekey the next time he is online.

Reply-able encryption and ratcheting

Signal only uses prekeys when establishing a new conversation between Alice and Bob. Once they have exchanged messages, they can use the authenticated communication channel they’ve established to exchange fresh ephemeral keys periodically. Signal actually does this on every message that Alice and Bob exchange. When Alice sends a message to Bob, she attaches a fresh ephemeral public key that Bob can use to encrypt his reply. Once she receives this reply from Bob, she discards the old ephemeral secret key and generates a new one. This process ensures that fresh encryption and authentication keys are continually being derived in a process known as ratcheting. (So called because it is impossible to reverse the process to figure out what previous keys were used). You can read more about it in Signal’s excellent documentation.

The ECDH AKEM developed in the last blog post already generates fresh ephemeral keys for each message and attaches them to the ciphertext as the encapsulated key. The authentication provided by the KEM assures Bob that Alice is the source of this ephemeral key, much like the signature in a signed prekey. So in principle, Bob could use this ephemeral public key in the place of Alice’s static public key when constructing a reply.

But the problem is that the AKEM destroys the ephemeral secret key as soon as the message to Bob has been sent. So if Bob replies using that ephemeral key then Alice won’t be able to decrypt Bob’s messages. What we need is some way to keep track of the state of the interaction from one message to the next. Well, this is exactly what the Tag-KEM abstraction allows. Recall that in a Tag-KEM, when Alice wants to send a message to Bob she first calls a key() function with her secret key and Bob’s public key. This returns a data encryption key (DEK) that she can use to encrypt the message and an opaque state object. She then calls a separate auth_encapsulate function passing in this state object and a “tag”, which returns the encapsulated key.

We can adapt the Tag-KEM API so that the encapsulate operation returns a new state along with the encapsulated key. Alice can then keep this state around and use it to process the response from Bob. I’m going to call this notion replyable encryption (abusing the English language a little), and an AKEM that supports replying to a received message a replyable KEM or rKEM. An rKEM acts just like an AKEM for the first message from Alice to Bob, but then allows Bob to reply to that message upgrading the exchange to an interactive handshake providing stronger security properties as discussed in the Noise specification.

Warning: as with some material in the previous blog post, this appears to be an entirely novel construction that hasn’t been studied extensively by cryptographers. The point of these blog posts is to explore interesting perspectives on cryptography. There may be fatal weaknesses in what follows! Use Noise or Signal if you want something for production.

An rKEM has the following operations:

  • rKEM.keygen() generates the long-term public and secret keys for a party.
  • rKEM.begin(mySecretKey, theirPublicKey) starts a conversation between the local party’s secret key and the remote party’s public key. It returns an opaque state object.
  • rKEM.key(state) returns a Data Encryption Key (DEK) to use for sending a message to the other party.
  • rKEM.auth_encap(state, tag) encapsulates the DEK and returns the encapsulated key and a new state object. The new state object encapsulates the ephemeral secret key used by the KEM.
  • rKEM.auth_decap(state, encapsulatedKey, tag) authenticates the encapsulated key and tag based on the current state object and returns the DEK and a new state object (encapsulating the received ephemeral public key), or returns an error if authentication failed.

To send a first message to Bob, Alice carries out the following steps:

var state = rKEM.begin(aliceSecretKey, bobPublicKey);
var dek = rKEM.key(state);
var (ciphertext, tag) = DEM.encrypt(dek, message);
var (encapKey, state) = rKEM.auth_encap(state, tag);
// Alice then sends encapKey, ciphertext, tag to Bob

On the other side, Bob receives the message from Alice and then runs the following operations to decrypt it and send a reply:

var state = rKEM.begin(bobSecretKey, alicePublicKey);
var (dek, state) = rKEM.auth_decap(state, encapKey, tag);
var message = DEM.decrypt(dek, ciphertext, tag);
// Now compose reply
var dek = rKEM.key(state);
var (ciphertext, tag) = DEM.encrypt(dek, reply);
var (encapKey, state) = rKEM.auth_encap(state, tag);
// Send encapKey, ciphertext, tag back to Alice

Alice can then perform the same process on her end using her remembered state from the first message to decrypt the response and generate another reply, and so on. At each step the sender and recipient keys in the state are replaced by fresh ephemeral keys, ensuring that the keys are ratcheted on every new message.

I created a very basic (and probably very insecure!) implementation of this concept on GitHub, along with some implementations of other KEM variants discussed in the articles. You can also see a unit test version of the Alice and Bob conversation with lots of debug logging to see how it all works. This version also partially supports multiple recipients, providing a tantalising glimpse of secure end-to-end encrypted group chat. A fun project would be to flesh out the implementation to keep track of the full Noise handshake state rather than just ephemeral keys, to ensure derived keys depend on a hash of the history of all previous interactions too.

Conclusions

In this post we’ve seen how to solve the thorny problems of forward secrecy, replay protection, and resistance to key compromise impersonation. By first introducing signed prekeys and then adapting our KEM interface to support interactive handshake protocols we can solve all of these problems using the security properties of different Diffie-Hellman key agreement configurations.

It turns out that the Tag-KEM abstraction that was developed for entirely different reasons is always half way to being an interactive protocol by adding the crucial notion of a state. The notion of replyable encryption and rKEMs bridges the gap between a non-interactive KEM and a fully interactive protocol by threading that state into every operation and allowing a new state to be returned. The result transforms the humble KEM into a secure protocol state machine.

I hope you’ve enjoyed this series of articles on KEMs. We started by showing how the KEM/DEM paradigm is the right way to think about public key encryption, and have then incrementally adapted that abstraction to encompass more and more features: sender authentication, multiple recipients, security against insider threats, and finally to interactive protocols that can (in theory) achieve quite strong security properties. It’s been a fun learning exercise for me writing them and I hope you learned something from reading them. I think the notion of KEM as a unifying abstraction for teaching cryptography concepts has a lot to recommend it.

One notable absence from most of these discussions has been that of signatures. I hope to write a blog post about signatures soon, because there are some fascinating connections here that also illuminate good reasons not to use them. Or at least, to use them sparingly in specific circumstances. Until the next time…

Author: Neil Madden

Security Director at ForgeRock. Experienced software engineer with a PhD in computer science. Interested in application security, applied cryptography, logic programming and intelligent agents.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.