Why you really can parse HTML (and anything else) with regular expressions

There is a common story that has played out many times in the history of computer science. A software engineer learns regular expressions for the first time, feels like a god, and then immediately over-applies them to situations they are not great at, like parsing HTML. What then usually happens is that a more experienced developer patiently explains why HTML cannot be parsed with regular expressions, because HTML is (at least) a context-free language, and you should use a proper HTML parser instead. In this post, I’ll explain why the advice is right, but the reasoning is wrong.

Note: in this post, I am talking about regular expressions as defined in theory, not the heavily extended regexes of Perl etc.

Continue reading “Why you really can parse HTML (and anything else) with regular expressions”

Can you ever (safely) include credentials in a URL?

URLs are a cornerstone of the web, and are the basic means by which content and resources are shared and disseminated. People copy and paste URLs into Slack or WhatsApp to share interesting links. Google crawls the web, discovering and indexing such links. But what happens when the page you want to link is not public and requires credentials in order to view or interact with it? Suddenly a URL is no longer sufficient, unless the recipient happens to already have credentials. Sometimes they do, and everything is fine, but often they do not. If we really do want to give them access, the problem becomes how to securely pass along some credentials with the URL so that they can access the page we have linked.

A commonly desired approach to this problem is to encode the credentials into the URL itself. While convenient, this solution is fraught with dangers and frequently results in credentials being exposed in insecure contexts. In this article, we’ll look at various ways to accomplish this, the ways that things can go wrong, and conclude with a set of guidelines for cases where this can be made secure, and actually improve security overall.

Continue reading “Can you ever (safely) include credentials in a URL?”

Simplifying JOSE

Hot on the heels of my dives into authenticated encryption and key-driven cryptographic agility, I thought I’d lay out a sketch of what I think a simpler core JOSE/JWT spec might look like. We’ll take in misuse-resistant crypto and Macaroons along the way. And cats. But first, a brief diversion into JSON.

Continue reading “Simplifying JOSE”

Public key authenticated encryption and why you want it (Part III)

In Part I, we saw that authenticated encryption is usually the security goal you want in both the symmetric and public key settings. In Part II, we then looked at some ways of achieving public key authenticated encryption (PKAE), and discovered that it is not straightforward to build from separate signing and encryption methods, but it is relatively simple for Diffie-Hellman. In this final part, we will look at how existing standards approach the problem and how they could be improved.

Continue reading “Public key authenticated encryption and why you want it (Part III)”

Public key authenticated encryption and why you want it (Part II)

In Part I, I made the argument that even when using public key cryptography you almost always want authenticated encryption. In this second part, we’ll look at how you can actually achieve public key authenticated encryption (PKAE) from commonly available building blocks. We will concentrate only on approaches that do not require an interactive protocol. (Updated 12th January 2019 to add a description of a NIST-approved key-agreement mode that achieves PKAE).

Continue reading “Public key authenticated encryption and why you want it (Part II)”

Public key authenticated encryption and why you want it (Part I)

If you read or watch any recent tutorial on symmetric (or “secret key”) cryptography, one lesson should be clear: in 2018 if you want to encrypt something you’d better use authenticated encryption. This not only hides the content of a message, but also ensures that the message was sent by one of the parties that has access to the shared secret key and that it hasn’t been tampered with. It turns out that without these additional guarantees (integrity and authenticity), the contents of a message often does not remain secret for long either. Continue reading “Public key authenticated encryption and why you want it (Part I)”

Key-driven cryptographic agility

One of the criticisms of the JOSE/JWT standards is that they give an attacker too much flexibility, by allowing them to specify how a message should be processed. In particular, the standard “alg” header tells the recipient what cryptographic algorithm was used to sign or encrypt the contents. Letting the attacker chose the algorithm can lead to disastrous security vulnerabilities. Continue reading “Key-driven cryptographic agility”