Convergence in web and API security

Every now and then technologies that initially appear to be distinct end up converging on a common approach from opposite directions. I believe that something like that is happening right now in approaches to web and API authentication around the use of tokens and cookies.

Continue reading “Convergence in web and API security”

A few comments on ‘age’

Update: I’ve updated the section on Cryptographic Doom at the end of the article after clarifications from the age author. That specific criticism was based on my misreading of the age spec.

Age is a new tool for encrypting files, intended to be a more modern successor to PGP/GPG for file encryption. This is a welcome development, as PGP has definitely been showing its age recently. On the face of it, age looks like a good replacement using modern algorithms. But I have a few concerns about its design.

Authenticated encryption

One of the innovations of age is that it aims to support an streaming authenticated encryption. The spec (such as it is) links to Adam Langley’s blog post about streaming encryption, which mentions use-cases like the following:

gpg -d your_archive.tgz.gpg | tar xz

The comments on Hacker News also mentioned cases where people pipe a decrypted file into a shell. Given that age links to this blog post and has implemented a secure streaming AEAD mode, it seems reasonable to suppose that it is intended to be secure in these kinds of use-cases. Each chunk (segment) of ciphertext is authenticated before being decrypted and output, so tar or the shell never sees unauthenticated plaintext and so an attacker can’t tamper with the ciphertext to influence the data being fed into the downstream process. The worst the attacker can do is to truncate the output by corrupting one or more segments causing the decryption to abort halfway (this might still allow significant mischief).

What is authenticated encryption?

Age supports a small number of algorithms. You can encrypt with a password using scrypt. In this case a symmetric key is derived from the password and you get symmetric authenticated encryption as the security goal. This doesn’t just mean that the ciphertext is protected from tampering, it also means that the encrypted file must have come from somebody who knows the password. Assuming you chose a strong password and only shared it with people you trust, then successful authenticated decryption provides strong evidence that the file came from a trusted source.

In terms of threat models you could say that authenticated encryption is intended to protect against spoofing threats as well as tampering threats. (The S and T in STRIDE at a very basic level). Unfortunately, the age spec doesn’t document its threat model or the security goals it is intended to achieve so I’m having to read between the lines to work out what was intended.

For public key cryptography, the notion of authenticated encryption becomes more complicated. I wrote a three-part blog post about it. There are public key authenticated encryption modes, such as NaCl’s box, but age doesn’t use one of those and instead opts for unauthenticated ECIES encryption (like JOSE’s ECDH-ES algorithm) using X25519. So while age uses symmetric authenticated encryption for the file contents, the symmetric file key is itself encrypted using an unauthenticated mode. This means an attacker cannot tamper with the encrypted ciphertext, but they can completely replace it with one of their own choosing. What this means is that age is secure against chosen ciphertext attacks from a confidentiality point of view, but it doesn’t actually provide any origin authentication. You can’t establish that the encrypted file came from a trusted party, so it is completely insecure to pipe the output of age into another tool without independently establishing the authenticity of the file.

The age spec explicitly lists signing and support for web of trust as out of scope. Instead it suggests using separate tools such as signify/minisign if you want to be sure of where a file came from. After all, this is the Unix philosophy: small composable tools that do just one thing. But if you click through the NaCl link above, djb links to a definition of the public key authenticated encryption security goal that he uses for box. This paper (which is old enough to drink in the UK) makes it clear that achieving authenticated encryption by generic composition of signatures and public key encryption is surprisingly difficult; more so than in the symmetric setting as there is no case like Encrypt-then-MAC which is generally always secure. For symmetric crypto we long ago gave up pretending that generic composition was something end users can get right on their own and opted for dedicated AE modes. For some reason, djb seems to be the only person to realise the same applies to public key cryptography.

A second problem with requiring a generic composition of signing and encryption is that it totally kills the streaming use-cases. Either I have to verify a signature over the entire encrypted file before decrypting it, or I have to decrypt to a temporary file that I then verify, or I need to define my own chunked streaming signature verification tool and combine that with age (and hope the chunk sizes line up). Users won’t get this right, so we’ll be right back at streaming unverified plaintext into shell commands.

So how could age fix this? Most importantly, the spec should define its security goals. I believe the correct security goal is authenticated encryption for both symmetric and public key use cases. The simplest way to achieve this would be to swap out the ephemeral X25519 code in favour of encrypting the file key with NaCl’s crypto_box. The sender must supply their own private key during encryption (which could be read out of ~/.age automatically). The recipient either supplies the expected sender’s public key on the command line, or else has a file of trusted senders in their age config directory – perhaps populated by TOFU if you don’t want to get into something like web of trust, but this problem is going to have to be solved somehow.

Unfortunately, as I pointed out on HN, you can’t simply use a static key pair with age to achieve authenticated encryption as age’s key-wrapping algorithm is completely insecure when used in this way (it uses a fixed nonce but isn’t nonce reuse misuse resistant). My overall impression of age is that it uses good algorithms in non-standard ways and then justifies this with ad-hoc reasoning about why it’s safe in this specific implementation. I’d be much happier if it used existing mechanisms from libsodium, which appear to be sufficient to cover all its use-cases.

Cryptographic Doom

The age spec defines a header that lists ways to derive the file decryption key for each recipient. For example, here are some examples from the spec:

-> X25519 8hWaIUmk67IuRZ41zMk2V9f/w3f5qUnXLL7MGPA+zE8tXgpAxKgqyu1jl9I/ATwFgV42ZbNgeAlvCTJ0WgvfEo
-> scrypt GixTkc7+InSPLzPNGU6cFw 18
-> ssh-rsa SkdmSg

In each case we have an algorithm identifier followed by algorithm-specific parameters. For example, in the X25519 case we have an ephemeral public key and then an encrypted file key.

But apart from syntax this header is incredibly close to JOSE! You might as well write it as follows:

{ “alg”: “X25519”,
  “epk”: { ... } }

JOSE even supports multiple recipients (in the lesser used JSON Serialization format) and ECIES with key wrapping. But the cryptographic community has rightly beaten up JOSE for requiring this algorithm header and it led to catastrophic attacks.

Edit: Filippo Valsorda (the author of age) has pointed out that age only uses the algorithm identifier (key type) to match the recipient’s key, not to determine which algorithm to use. Age keys are uniquely linked to an algorithm in exactly the manner I suggest.

I’m fairly sure that age is not vulnerable to the same kind of attacks, but I’m not convinced it never will be. Even if there is no immediate attack, it still violates Moxie’s Cryptographic Doom Principle. Although the age header is protected by a MAC, it cannot verify that MAC until it decrypts the file key. In order to decrypt the file key it trusts the header to tell it what algorithm to use.

Why does this mistake keep being made? As I’ve written before, there are better ways to handle this that systematically avoid these issues.

In summary I think age is interesting and solving a genuine problem. But I think the design could still be improved from where it is today to provide clearer security goals and avoid potential pitfalls in the future.

A logical view of objects

If you spend any time writing code to talk to a database from an object-oriented language, you will come across the notion of an object-relational impedance mismatch. This is the idea that there are fundamental differences between the data models of relational databases and object-oriented programming languages that make it difficult to map between the two. This idea has spawned a large number of object-relational mapping (ORM) tools that attempt to bridge the divide. In this post, I’ll argue that while there are many practical differences between SQL and typical OO languages, there is no fundamental disagreement between the two – so long as we adopt an appropriate logical view of objects.

Continue reading “A logical view of objects”

Re-evaluating Searle’s arguments against machine consciousness

Like many students of artificial intelligence, I have long viewed John Searle’s Chinese Room argument with slight annoyance. It seems so obviously wrong headed and yet it persists. So I was pleasantly surprised to find myself re-evaluating my interpretation of the argument after watching some old videos of Searle. This is not to say that I suddenly believe it is a good argument, but I now think I understand a little better why Searle makes it.

Continue reading “Re-evaluating Searle’s arguments against machine consciousness”

API Security in Action now in early access

As some of you may already know, I have been working on a book on API security for Manning in my spare time: API Security in Action. The book has now reach a point where the publisher are happy for it to go into early-access release (MEAP as they call it), so you can now pre-order and download the first 3 chapters on their website. Use discount code fccmadden to get 37% off when ordering.


The book covers the basics of securing remotely accessible APIs (REST) taking a ground-up approach, and then moves on in the later chapters to look at the specifics of securing microservice APIs in Kubernetes and even APIs for the Internet of Things (IoT). It covers lots of things you’d expect, like JSON Web Tokens and OAuth 2, and some things you perhaps wouldn’t like Waterken-style capability URLs and Macaroons. I’ve also taken an opinionated approach, which will come as no surprise to anybody who knows me. JWT is covered because its an important technology, but that doesn’t mean it gets a free pass. I’ve tried to separate the good parts from the bad, and warn you away from the real foot-guns.

The principle I’ve followed so far is that good practical advice (the “in Action” part) requires having a good understanding of how things actually work and what security properties each component provides. So rather than just throwing together a JWT library and an API gateway and showing a few deployment patterns, you’ll learn some of the deeper principles involved first. I start off with some basic secure development techniques and common attacks against REST APIs. We then look at the basic security controls of authentication, rate-limiting, access control, and audit logging. You’ll build primitive (but secure) versions of all these mechanisms from scratch before moving on to look at more fully-featured alternatives as the book progresses.

I hope you take a look. If you do, please leave feedback (good or bad) in the forum or email me personally. It’s been a much harder process writing it than I originally expected, trying to balance the desire to teach everything with a need to keep it simple enough to engage readers. Hopefully I’ve struck the right balance, but let me know either way. And of course, if you spot anything outright wrong then let me know about that too so I can fix it.

Why you really can parse HTML (and anything else) with regular expressions

There is a common story that has played out many times in the history of computer science. A software engineer learns regular expressions for the first time, feels like a god, and then immediately over-applies them to situations they are not great at, like parsing HTML. What then usually happens is that a more experienced developer patiently explains why HTML cannot be parsed with regular expressions, because HTML is (at least) a context-free language, and you should use a proper HTML parser instead. In this post, I’ll explain why the advice is right, but the reasoning is wrong.

Note: in this post, I am talking about regular expressions as defined in theory, not the heavily extended regexes of Perl etc.

Continue reading “Why you really can parse HTML (and anything else) with regular expressions”

Can you ever (safely) include credentials in a URL?

Update: an updated version of the ideas in this blog post appears in chapter 9 of my book. You may also like my proposal: Towards a standard for bearer token URLs.

URLs are a cornerstone of the web, and are the basic means by which content and resources are shared and disseminated. People copy and paste URLs into Slack or WhatsApp to share interesting links. Google crawls the web, discovering and indexing such links. But what happens when the page you want to link is not public and requires credentials in order to view or interact with it? Suddenly a URL is no longer sufficient, unless the recipient happens to already have credentials. Sometimes they do, and everything is fine, but often they do not. If we really do want to give them access, the problem becomes how to securely pass along some credentials with the URL so that they can access the page we have linked.

A commonly desired approach to this problem is to encode the credentials into the URL itself. While convenient, this solution is fraught with dangers and frequently results in credentials being exposed in insecure contexts. In this article, we’ll look at various ways to accomplish this, the ways that things can go wrong, and conclude with a set of guidelines for cases where this can be made secure, and actually improve security overall.

Continue reading “Can you ever (safely) include credentials in a URL?”

Simplifying JOSE

Hot on the heels of my dives into authenticated encryption and key-driven cryptographic agility, I thought I’d lay out a sketch of what I think a simpler core JOSE/JWT spec might look like. We’ll take in misuse-resistant crypto and Macaroons along the way. And cats. But first, a brief diversion into JSON.

Continue reading “Simplifying JOSE”

Public key authenticated encryption and why you want it (Part III)

In Part I, we saw that authenticated encryption is usually the security goal you want in both the symmetric and public key settings. In Part II, we then looked at some ways of achieving public key authenticated encryption (PKAE), and discovered that it is not straightforward to build from separate signing and encryption methods, but it is relatively simple for Diffie-Hellman. In this final part, we will look at how existing standards approach the problem and how they could be improved.

Continue reading “Public key authenticated encryption and why you want it (Part III)”

Public key authenticated encryption and why you want it (Part II)

In Part I, I made the argument that even when using public key cryptography you almost always want authenticated encryption. In this second part, we’ll look at how you can actually achieve public key authenticated encryption (PKAE) from commonly available building blocks. We will concentrate only on approaches that do not require an interactive protocol. (Updated 12th January 2019 to add a description of a NIST-approved key-agreement mode that achieves PKAE).

Continue reading “Public key authenticated encryption and why you want it (Part II)”

%d bloggers like this: