Why you really can parse HTML (and anything else) with regular expressions

There is a common story that has played out many times in the history of computer science. A software engineer learns regular expressions for the first time, feels like a god, and then immediately over-applies them to situations they are not great at, like parsing HTML. What then usually happens is that a more experienced developer patiently explains why HTML cannot be parsed with regular expressions, because HTML is (at least) a context-free language, and you should use a proper HTML parser instead. In this post, I’ll explain why the advice is right, but the reasoning is wrong.

Note: in this post, I am talking about regular expressions as defined in theory, not the heavily extended regexes of Perl etc.

Continue reading “Why you really can parse HTML (and anything else) with regular expressions”

Can you ever (safely) include credentials in a URL?

URLs are a cornerstone of the web, and are the basic means by which content and resources are shared and disseminated. People copy and paste URLs into Slack or WhatsApp to share interesting links. Google crawls the web, discovering and indexing such links. But what happens when the page you want to link is not public and requires credentials in order to view or interact with it? Suddenly a URL is no longer sufficient, unless the recipient happens to already have credentials. Sometimes they do, and everything is fine, but often they do not. If we really do want to give them access, the problem becomes how to securely pass along some credentials with the URL so that they can access the page we have linked.

A commonly desired approach to this problem is to encode the credentials into the URL itself. While convenient, this solution is fraught with dangers and frequently results in credentials being exposed in insecure contexts. In this article, we’ll look at various ways to accomplish this, the ways that things can go wrong, and conclude with a set of guidelines for cases where this can be made secure, and actually improve security overall.

Continue reading “Can you ever (safely) include credentials in a URL?”

Simplifying JOSE

Hot on the heels of my dives into authenticated encryption and key-driven cryptographic agility, I thought I’d lay out a sketch of what I think a simpler core JOSE/JWT spec might look like. We’ll take in misuse-resistant crypto and Macaroons along the way. And cats. But first, a brief diversion into JSON.

Continue reading “Simplifying JOSE”

Public key authenticated encryption and why you want it (Part III)

In Part I, we saw that authenticated encryption is usually the security goal you want in both the symmetric and public key settings. In Part II, we then looked at some ways of achieving public key authenticated encryption (PKAE), and discovered that it is not straightforward to build from separate signing and encryption methods, but it is relatively simple for Diffie-Hellman. In this final part, we will look at how existing standards approach the problem and how they could be improved.

Continue reading “Public key authenticated encryption and why you want it (Part III)”

Public key authenticated encryption and why you want it (Part II)

In Part I, I made the argument that even when using public key cryptography you almost always want authenticated encryption. In this second part, we’ll look at how you can actually achieve public key authenticated encryption (PKAE) from commonly available building blocks. We will concentrate only on approaches that do not require an interactive protocol. (Updated 12th January 2019 to add a description of a NIST-approved key-agreement mode that achieves PKAE).

Continue reading “Public key authenticated encryption and why you want it (Part II)”

Public key authenticated encryption and why you want it (Part I)

If you read or watch any recent tutorial on symmetric (or “secret key”) cryptography, one lesson should be clear: in 2018 if you want to encrypt something you’d better use authenticated encryption. This not only hides the content of a message, but also ensures that the message was sent by one of the parties that has access to the shared secret key and that it hasn’t been tampered with. It turns out that without these additional guarantees (integrity and authenticity), the contents of a message often does not remain secret for long either. Continue reading “Public key authenticated encryption and why you want it (Part I)”

Key-driven cryptographic agility

One of the criticisms of the JOSE/JWT standards is that they give an attacker too much flexibility, by allowing them to specify how a message should be processed. In particular, the standard “alg” header tells the recipient what cryptographic algorithm was used to sign or encrypt the contents. Letting the attacker chose the algorithm can lead to disastrous security vulnerabilities. Continue reading “Key-driven cryptographic agility”

A small tweak to checked exceptions

Almost everyone hates Java’s checked exceptions. Even those that still use them will admit that they lead to a lot of boilerplate. Everyone has code along the following lines: Continue reading “A small tweak to checked exceptions”

Moving away from UUIDs

If you need an unguessable random string (for a session cookie or access token, for example), it can be tempting to reach for a random UUID, which looks like this:

 88cf3e49-e28e-4c0e-b95f-6a68a785a89d

This is a 128-bit value formatted as 36 hexadecimal digits separated by hyphens. In Java and most other programming languages, these are very simple to generate:


import java.util.UUID;


String id = UUID.randomUUID().toString();

Under the hood this uses a cryptographically secure pseudorandom number generator (CSPRNG), so the IDs generated are pretty unique. However, there are some downsides to using random UUIDs that make them less useful than they first appear. In this note I will describe them and what I suggest using instead.

Continue reading “Moving away from UUIDs”

Java KeyStores – the gory details

Java KeyStores are used to store key material and associated certificates in an encrypted and integrity protected fashion. Like all things Java, this mechanism is pluggable and so there exist a variety of different options. There are lots of articles out there that describe the different types and how you can initialise them, load keys and certificates, etc. However, there is a lack of detailed technical information about exactly how these keystores store and protect your key material. This post attempts to gather those important details in one place for the most common KeyStores.
Continue reading “Java KeyStores – the gory details”