I was catching up on the always excellent Security. Cryptography. Whatever. podcast, and enjoyed the episode with Colm MacCárthaigh about a bunch of topics around TLS. It’s a great episode that touches a lot of subjects I’m interested in, so go ahead and listen to it if you haven’t already, and definitely subscribe. I want to pick up on one of the topics in the podcast in this article, and discuss part of the OAuth specs that I think deserves to be better known.
One topic that Colm covers is his dislike of mutual TLS (mTLS), in which TLS clients authenticate to the server using client authentication certificates (in addition to usual server certificate authentication). He also wrote an entertaining Twitter thread about the same topic a few years ago, if you prefer to read the arguments rather than listen (although, be warned it’s about 30 tweets long).
mTLS has become very popular in recent years for securing service-to-service connections, particularly with the rise of service meshes, so this critique of the approach from a very experienced security architect deserves a lot of attention. I’ll try to summarise his arguments against using TLS certificate authentication for clients:
- mTLS authenticates the client at the connection (transport) level, rather than individual requests. Every request that is sent over that channel is then implicitly assumed to originate from that client. This is a problem because there are whole classes of extremely common security vulnerabilities which basically involve the attacker injecting requests of their choosing into a legitimate client’s connection: think SQL injection, XSS, CSRF, SSRF, request smuggling, etc.
- Client certificate authentication fails open: if you don’t configure your server correctly to require a client cert then the server won’t ask for one and the client won’t send one, and (often) nothing will fail. You will just silently not being authenticating.
- TLS allows re-negotiating client certificate authentication at any time, even mid-request. (TLS knows nothing about the boundaries of requests at the application level). So the authentication context can change arbitrarily, whereas your application probably only checks it when a session is initially established. This is largely fixed by TLS 1.3 and protocols like HTTP/2 completely ban any post-handshake re-authentication.
- Parsing and validating X.509 certificates is extraordinarily complex and requires a lot of complex code with subtle security properties. In normal TLS, the server (where most of the stuff you want to protect is) doesn’t have to care about any of this and can largely treat its own certificates as opaque blobs of bytes. But with client certificate authentication the server now has to process whatever crazy cert chain the client decides to send it. This enormously increases the attack surface in the server’s TLS stack.
- Certificate revocation can perhaps best be described as a work in progress, and the solutions that do exist are rarely enforced in practice. As discussed in the podcast, revocation for service to service connections is quite different to normal WebPKI certificate revocation as well.
I think these are all important and valid criticisms of mTLS. The vulnerability of service meshes to SSRF attacks is something that I warn about in chapter 10 of my book (and I discuss OAuth mTLS in depth in chapter 11). I agree with all these points. I also agree with Thomas Ptacek’s point on the podcast that service-to-service mTLS is still quite nice for expressing and enforcing coarse-grained access control policies. But is there a way to use mTLS while avoiding some of these weaknesses?
OAuth and mTLS: better than the sum of their parts
There’s nothing the OAuth world likes more than a new specification. There are literally dozens of RFCs describing extensions and modifications of OAuth, as well as related protocols like OIDC. So I’ll forgive you if you’re not familiar with OAuth mTLS. On a first look, this spec looks like it is primarily concerned with allowing an OAuth client to authenticate to the authorization server (AS) using client certificate authentication. Uh-oh, this sounds like exactly what Colm MacCárthaigh just warned us about, and that’s kinda true. However, there is a second part of the spec where it allows an OAuth access token to be “bound” to a client certificate: the token can then only be used over a TLS connection that has been authenticated with the associated certificate. The two parts are independent: you can use certificate-bound access tokens without having to use mTLS client authentication. Even a public client can use certificate-bound access tokens if you want.
So why is this better than just using mTLS? Firstly, and most importantly, with certificate-bound access tokens (CBATs from now on), the client certificate authentication is an additional security measure and not the sole one. To make an API request to another service you need both the certificate (and private key) and also the access token. The access token is communicated at the application level, such as within a HTTP header, avoiding the layering violations that Colm talks about. An SSRF attacker cannot simply piggy-back on the trusted mTLS connection between services but needs to also somehow obtain a valid access token for any API calls they make.
This also makes it a bit less likely that the system will fail open: validating the client certificate and checking the access token will typically be done separately and by different code. You’d need to both misconfigure your TLS stack and to also fail to properly validate the access token. At the moment, this is entirely possible because many OAuth resource server implementations do not understand CBATs and so will probably ignore them, but this should improve over time—especially as standards like Open Banking mandate use of OAuth mTLS.
I think most importantly, OAuth mTLS drastically simplifies the validation code needed by most servers to process a client certificate. All the certificate checking is done by the AS when it issues the access token, and even this can be drastically simplified—a self-signed certificate is fine if another authentication mechanism is being used, for example. Downstream services that receive API requests from clients only need to check that the client correctly authenticated at the TLS level and that the SHA-256 hash of the certificate presented matches the hash value stored in the access token metadata. That’s it. No need to validate or even parse the certificate itself, no need to build certificate paths, or any of the other complexities of X.509. If the hash of the certificate matches the one in the AT, and the handshake completed successfully, then it must be the same client.
It’s common for reverse proxies or load balancers to terminate TLS connections on behalf of services, such as service mesh sidecar proxies, and many of these proxies already support forwarding a SHA-256 hash of any client certificate to the service as a HTTP header. For many services then, the validation of a CBAT boils down to a simple string equality check. (Just for the love of God, make sure that your proxy is also stripping this header from any incoming requests…)
Finally, using mTLS via CBATs also somewhat improves the revocation situation. As the certificate is useless without an access token, you can use OAuth’s revocation mechanisms to revoke access instead. If you also want to revoke the certificate, then you only need to do this at one place: the AS. I won’t claim that OAuth revocation is perfect by any means, but I would be much more confident in actually deploying it at scale, and many people have.
I think this combination of OAuth and mTLS in the form of CBATs is a nice improvement over either in isolation. It upgrades OAuth access tokens from pure bearer tokens that are relatively easy to steal, and it enormously simplifies validation of client certificates for backend services. For some service-to-service (microservice) deployment scenarios, I think this makes a lot of sense and I’d like to see service mesh authorization move in this direction in future.
I’d love to hear what you think about CBATs and mTLS – please patiently explain what I got wrong in the comments below :-)
(Incidentally, I was heavily involved in the design of mTLS support in my employer’s OAuth implementation, and we support neat things like adding certificate-bindings via Macaroon caveats. I mention this because I’m proud of it, not to sell you stuff).