Why you really can parse HTML (and anything else) with regular expressions

There is a common story that has played out many times in the history of computer science. A software engineer learns regular expressions for the first time, feels like a god, and then immediately over-applies them to situations they are not great at, like parsing HTML. What then usually happens is that a more experienced developer patiently explains why HTML cannot be parsed with regular expressions, because HTML is (at least) a context-free language, and you should use a proper HTML parser instead. In this post, I’ll explain why the advice is right, but the reasoning is wrong.

Note: in this post, I am talking about regular expressions as defined in theory, not the heavily extended regexes of Perl etc.

Continue reading “Why you really can parse HTML (and anything else) with regular expressions”

Simplifying JOSE

Hot on the heels of my dives into authenticated encryption and key-driven cryptographic agility, I thought I’d lay out a sketch of what I think a simpler core JOSE/JWT spec might look like. We’ll take in misuse-resistant crypto and Macaroons along the way. And cats. But first, a brief diversion into JSON.

Continue reading “Simplifying JOSE”

Public key authenticated encryption and why you want it (Part III)

In Part I, we saw that authenticated encryption is usually the security goal you want in both the symmetric and public key settings. In Part II, we then looked at some ways of achieving public key authenticated encryption (PKAE), and discovered that it is not straightforward to build from separate signing and encryption methods, but it is relatively simple for Diffie-Hellman. In this final part, we will look at how existing standards approach the problem and how they could be improved.

Continue reading “Public key authenticated encryption and why you want it (Part III)”

A small tweak to checked exceptions

Almost everyone hates Java’s checked exceptions. Even those that still use them will admit that they lead to a lot of boilerplate. Everyone has code along the following lines: Continue reading “A small tweak to checked exceptions”

Moving away from UUIDs

If you need an unguessable random string (for a session cookie or access token, for example), it can be tempting to reach for a random UUID, which looks like this:

 88cf3e49-e28e-4c0e-b95f-6a68a785a89d

This is a 128-bit value formatted as 36 hexadecimal digits separated by hyphens. In Java and most other programming languages, these are very simple to generate:


import java.util.UUID;


String id = UUID.randomUUID().toString();

Under the hood this uses a cryptographically secure pseudorandom number generator (CSPRNG), so the IDs generated are pretty unique. However, there are some downsides to using random UUIDs that make them less useful than they first appear. In this note I will describe them and what I suggest using instead.

Continue reading “Moving away from UUIDs”

So how *do* you validate (NIST) ECDH public keys?

Updated 20th July 2017 to clarify notation for the point of infinity. A previous version used the symbol 0 (zero) rather than O, which may have been confusing

In the wake of the recent critical security vulnerabilities in some JOSE/JWT libraries around ECDH public key validation, a number of implementations scrambled to implement specific validation of public keys to eliminate these attacks. But how do we know whether these checks are sufficient? Is there any guidance on what checks should be performed? The answer is yes, but it can be a bit hard tracking down exactly what validation needs to be done in which cases. For modern elliptic curve schemes like X25519 and Ed25519, there is some debate over whether validation should be performed at all in the basic primitive implementations, as the curve eliminates some of the issues while high-level protocols can be designed to eliminate others. However, for the NIST standard curves used in JOSE, the question is more clear cut: it is absolutely critical that public keys are correctly validated, as evidenced by the linked security alert.

Continue reading “So how *do* you validate (NIST) ECDH public keys?”