A flowchart for Cache-Control headers

I always struggle to remember what all the HTTP Cache-Control directives mean and when they are used, so I made this little flow chart to remind me. I think it’s roughly correct, but no doubt there are details that I’ve missed. It’s on GitHub so please send corrections.

cache-flowchart

A few comments on ‘age’

Update: I’ve updated the section on Cryptographic Doom at the end of the article after clarifications from the age author. That specific criticism was based on my misreading of the age spec.

Age is a new tool for encrypting files, intended to be a more modern successor to PGP/GPG for file encryption. This is a welcome development, as PGP has definitely been showing its age recently. On the face of it, age looks like a good replacement using modern algorithms. But I have a few concerns about its design.

Authenticated encryption

One of the innovations of age is that it aims to support an streaming authenticated encryption. The spec (such as it is) links to Adam Langley’s blog post about streaming encryption, which mentions use-cases like the following:

gpg -d your_archive.tgz.gpg | tar xz

The comments on Hacker News also mentioned cases where people pipe a decrypted file into a shell. Given that age links to this blog post and has implemented a secure streaming AEAD mode, it seems reasonable to suppose that it is intended to be secure in these kinds of use-cases. Each chunk (segment) of ciphertext is authenticated before being decrypted and output, so tar or the shell never sees unauthenticated plaintext and so an attacker can’t tamper with the ciphertext to influence the data being fed into the downstream process. The worst the attacker can do is to truncate the output by corrupting one or more segments causing the decryption to abort halfway (this might still allow significant mischief).

What is authenticated encryption?

Age supports a small number of algorithms. You can encrypt with a password using scrypt. In this case a symmetric key is derived from the password and you get symmetric authenticated encryption as the security goal. This doesn’t just mean that the ciphertext is protected from tampering, it also means that the encrypted file must have come from somebody who knows the password. Assuming you chose a strong password and only shared it with people you trust, then successful authenticated decryption provides strong evidence that the file came from a trusted source.

In terms of threat models you could say that authenticated encryption is intended to protect against spoofing threats as well as tampering threats. (The S and T in STRIDE at a very basic level). Unfortunately, the age spec doesn’t document its threat model or the security goals it is intended to achieve so I’m having to read between the lines to work out what was intended.

For public key cryptography, the notion of authenticated encryption becomes more complicated. I wrote a three-part blog post about it. There are public key authenticated encryption modes, such as NaCl’s box, but age doesn’t use one of those and instead opts for unauthenticated ECIES encryption (like JOSE’s ECDH-ES algorithm) using X25519. So while age uses symmetric authenticated encryption for the file contents, the symmetric file key is itself encrypted using an unauthenticated mode. This means an attacker cannot tamper with the encrypted ciphertext, but they can completely replace it with one of their own choosing. What this means is that age is secure against chosen ciphertext attacks from a confidentiality point of view, but it doesn’t actually provide any origin authentication. You can’t establish that the encrypted file came from a trusted party, so it is completely insecure to pipe the output of age into another tool without independently establishing the authenticity of the file.

The age spec explicitly lists signing and support for web of trust as out of scope. Instead it suggests using separate tools such as signify/minisign if you want to be sure of where a file came from. After all, this is the Unix philosophy: small composable tools that do just one thing. But if you click through the NaCl link above, djb links to a definition of the public key authenticated encryption security goal that he uses for box. This paper (which is old enough to drink in the UK) makes it clear that achieving authenticated encryption by generic composition of signatures and public key encryption is surprisingly difficult; more so than in the symmetric setting as there is no case like Encrypt-then-MAC which is generally always secure. For symmetric crypto we long ago gave up pretending that generic composition was something end users can get right on their own and opted for dedicated AE modes. For some reason, djb seems to be the only person to realise the same applies to public key cryptography.

A second problem with requiring a generic composition of signing and encryption is that it totally kills the streaming use-cases. Either I have to verify a signature over the entire encrypted file before decrypting it, or I have to decrypt to a temporary file that I then verify, or I need to define my own chunked streaming signature verification tool and combine that with age (and hope the chunk sizes line up). Users won’t get this right, so we’ll be right back at streaming unverified plaintext into shell commands.

So how could age fix this? Most importantly, the spec should define its security goals. I believe the correct security goal is authenticated encryption for both symmetric and public key use cases. The simplest way to achieve this would be to swap out the ephemeral X25519 code in favour of encrypting the file key with NaCl’s crypto_box. The sender must supply their own private key during encryption (which could be read out of ~/.age automatically). The recipient either supplies the expected sender’s public key on the command line, or else has a file of trusted senders in their age config directory – perhaps populated by TOFU if you don’t want to get into something like web of trust, but this problem is going to have to be solved somehow.

Unfortunately, as I pointed out on HN, you can’t simply use a static key pair with age to achieve authenticated encryption as age’s key-wrapping algorithm is completely insecure when used in this way (it uses a fixed nonce but isn’t nonce reuse misuse resistant). My overall impression of age is that it uses good algorithms in non-standard ways and then justifies this with ad-hoc reasoning about why it’s safe in this specific implementation. I’d be much happier if it used existing mechanisms from libsodium, which appear to be sufficient to cover all its use-cases.

Cryptographic Doom

The age spec defines a header that lists ways to derive the file decryption key for each recipient. For example, here are some examples from the spec:

-> X25519 8hWaIUmk67IuRZ41zMk2V9f/w3f5qUnXLL7MGPA+zE8tXgpAxKgqyu1jl9I/ATwFgV42ZbNgeAlvCTJ0WgvfEo
-> scrypt GixTkc7+InSPLzPNGU6cFw 18
kC4zjzi7LRutdBfOlGHCgox8SXgfYxRYhWM1qPs0ca8
-> ssh-rsa SkdmSg
SW+xNSybDWTCkWx20FnCcxlfGC889s2hRxT8+giPH2DQMMFV6DyZpveqXtNwI3ts
5rVkW/7hCBSqEPQwabC6O5ls75uNjeSURwHAaIwtQ6riL9arjVpHMl8O7GWSRnx3
NltQt08ZpBAUkBqq5JKAr20t46ZinEIsD1LsDa2EnJrn0t8Truo2beGwZGkwkE2Y
j8mC2GaqR0gUcpGwIk6QZMxOdxNSOO7jhIC32nt1w2Ep1ftk9wV1sFyQo+YYrzOx
yCDdUwQAu9oM3Ez6AWkmFyG6AvKIny8I4xgJcBt1DEYZcD5PIAt51nRJQcs2/ANP
+Y1rKeTsskMHnlRpOnMlXqoeN6A3xS+EWxFTyg1GREQeaVztuhaL6DVBB22sLskw
XBHq/XlkLWkqoLrQtNOPvLoDO80TKUORVsP1y7OyUPHqUumxj9Mn/QtsZjNCPyKN
ds7P2OLD/Jxq1o1ckzG3uzv8Vb6sqYUPmRvlXyD7/s/FURA1GetBiQEdRM34xbrB

In each case we have an algorithm identifier followed by algorithm-specific parameters. For example, in the X25519 case we have an ephemeral public key and then an encrypted file key.

But apart from syntax this header is incredibly close to JOSE! You might as well write it as follows:

{ “alg”: “X25519”,
  “epk”: { ... } }

JOSE even supports multiple recipients (in the lesser used JSON Serialization format) and ECIES with key wrapping. But the cryptographic community has rightly beaten up JOSE for requiring this algorithm header and it led to catastrophic attacks.

Edit: Filippo Valsorda (the author of age) has pointed out that age only uses the algorithm identifier (key type) to match the recipient’s key, not to determine which algorithm to use. Age keys are uniquely linked to an algorithm in exactly the manner I suggest.

I’m fairly sure that age is not vulnerable to the same kind of attacks, but I’m not convinced it never will be. Even if there is no immediate attack, it still violates Moxie’s Cryptographic Doom Principle. Although the age header is protected by a MAC, it cannot verify that MAC until it decrypts the file key. In order to decrypt the file key it trusts the header to tell it what algorithm to use.

Why does this mistake keep being made? As I’ve written before, there are better ways to handle this that systematically avoid these issues.

In summary I think age is interesting and solving a genuine problem. But I think the design could still be improved from where it is today to provide clearer security goals and avoid potential pitfalls in the future.

A logical view of objects

If you spend any time writing code to talk to a database from an object-oriented language, you will come across the notion of an object-relational impedance mismatch. This is the idea that there are fundamental differences between the data models of relational databases and object-oriented programming languages that make it difficult to map between the two. This idea has spawned a large number of object-relational mapping (ORM) tools that attempt to bridge the divide. In this post, I’ll argue that while there are many practical differences between SQL and typical OO languages, there is no fundamental disagreement between the two – so long as we adopt an appropriate logical view of objects.

Continue reading “A logical view of objects”

Re-evaluating Searle’s arguments against machine consciousness

Like many students of artificial intelligence, I have long viewed John Searle’s Chinese Room argument with slight annoyance. It seems so obviously wrong headed and yet it persists. So I was pleasantly surprised to find myself re-evaluating my interpretation of the argument after watching some old videos of Searle. This is not to say that I suddenly believe it is a good argument, but I now think I understand a little better why Searle makes it.

Continue reading “Re-evaluating Searle’s arguments against machine consciousness”

API Security in Action now in early access

As some of you may already know, I have been working on a book on API security for Manning in my spare time: API Security in Action. The book has now reach a point where the publisher are happy for it to go into early-access release (MEAP as they call it), so you can now pre-order and download the first 3 chapters on their website. Use discount code fccmadden to get 37% off when ordering.

Madden-API-MEAP-Thumb

The book covers the basics of securing remotely accessible APIs (REST) taking a ground-up approach, and then moves on in the later chapters to look at the specifics of securing microservice APIs in Kubernetes and even APIs for the Internet of Things (IoT). It covers lots of things you’d expect, like JSON Web Tokens and OAuth 2, and some things you perhaps wouldn’t like Waterken-style capability URLs and Macaroons. I’ve also taken an opinionated approach, which will come as no surprise to anybody who knows me. JWT is covered because its an important technology, but that doesn’t mean it gets a free pass. I’ve tried to separate the good parts from the bad, and warn you away from the real foot-guns.

The principle I’ve followed so far is that good practical advice (the “in Action” part) requires having a good understanding of how things actually work and what security properties each component provides. So rather than just throwing together a JWT library and an API gateway and showing a few deployment patterns, you’ll learn some of the deeper principles involved first. I start off with some basic secure development techniques and common attacks against REST APIs. We then look at the basic security controls of authentication, rate-limiting, access control, and audit logging. You’ll build primitive (but secure) versions of all these mechanisms from scratch before moving on to look at more fully-featured alternatives as the book progresses.

I hope you take a look. If you do, please leave feedback (good or bad) in the forum or email me personally. It’s been a much harder process writing it than I originally expected, trying to balance the desire to teach everything with a need to keep it simple enough to engage readers. Hopefully I’ve struck the right balance, but let me know either way. And of course, if you spot anything outright wrong then let me know about that too so I can fix it.

Why you really can parse HTML (and anything else) with regular expressions

There is a common story that has played out many times in the history of computer science. A software engineer learns regular expressions for the first time, feels like a god, and then immediately over-applies them to situations they are not great at, like parsing HTML. What then usually happens is that a more experienced developer patiently explains why HTML cannot be parsed with regular expressions, because HTML is (at least) a context-free language, and you should use a proper HTML parser instead. In this post, I’ll explain why the advice is right, but the reasoning is wrong.

Note: in this post, I am talking about regular expressions as defined in theory, not the heavily extended regexes of Perl etc.

Continue reading “Why you really can parse HTML (and anything else) with regular expressions”

Simplifying JOSE

Hot on the heels of my dives into authenticated encryption and key-driven cryptographic agility, I thought I’d lay out a sketch of what I think a simpler core JOSE/JWT spec might look like. We’ll take in misuse-resistant crypto and Macaroons along the way. And cats. But first, a brief diversion into JSON.

Continue reading “Simplifying JOSE”