Can you ever (safely) include credentials in a URL?

Update: an updated version of the ideas in this blog post appears in chapter 9 of my book. You may also like my proposal: Towards a standard for bearer token URLs.

URLs are a cornerstone of the web, and are the basic means by which content and resources are shared and disseminated. People copy and paste URLs into Slack or WhatsApp to share interesting links. Google crawls the web, discovering and indexing such links. But what happens when the page you want to link is not public and requires credentials in order to view or interact with it? Suddenly a URL is no longer sufficient, unless the recipient happens to already have credentials. Sometimes they do, and everything is fine, but often they do not. If we really do want to give them access, the problem becomes how to securely pass along some credentials with the URL so that they can access the page we have linked.

A commonly desired approach to this problem is to encode the credentials into the URL itself. While convenient, this solution is fraught with dangers and frequently results in credentials being exposed in insecure contexts. In this article, we’ll look at various ways to accomplish this, the ways that things can go wrong, and conclude with a set of guidelines for cases where this can be made secure, and actually improve security overall.

What kind of credentials?

The most usual kind of credential to grant access to a protected resource would be a username and password. Way back in the prehistory of the web there was a standard way to encode a username and password into a URL using the following syntax:

http://bob:sekret@example.invalid/some/path

Here “bob” is the username and “sekret” is the password. The way this was supposed to work is that when you clicked on that link, the web browser would automatically submit the username and password using HTTP Basic authentication (or possibly Digest authentication, but we should never talk of that). This turned out to be a terrible idea for many reasons:

  1. Firstly, putting your username and password in plaintext into a URL that can then be copied far and wide is a really quick way to account compromise. Giving your username and password to somebody else just so they can look at one of your cat photos is not the best idea in the world.
  2. Basic authentication just Base64-encodes the credentials, so unless you were using SSL then the password would be recoverable to anybody observing the network traffic. While sites sometimes put SSL on the login page, they often didn’t bother for other parts of the site (relying on time-limited session cookies instead), so sending a username and password to a random URL was quite risky. This is much better now with the widespread adoption of HTTPS.
  3. Due to the way Basic authentication works, after you have logged into one page with it, your browser will helpfully and proactively send the username and password to every other page on the site under the same path. This can leak credentials to unrelated pages.
  4. It was soon realised that using a URL like http://www.google.com:search@example.com was a really easy way to create convincing-looking phishing links.
  5. If the target of the URL is a login endpoint, then the attacker can attempt to get you to click on the link to perform a Login CSRF attack. In this case the honest user ends up with a session cookie for an account under the control of the attacker.

For all these reasons, this specific form of URL was deprecated back in 2005, and now support within browsers is patchy: Safari for instance will just silently ignore any username:password component when following such a link. Other browsers will tolerate them in some cases, but this varies considerably by browser and often by version to version. Some versions of Chrome refuse to follow such links and instead display a large red phishing warning page.

So if not a username and password, then what kind of credentials could we put in a URL? Ideally you’d want a limited-scope credential that provides only permission to access that one specific URL in just the ways that the sharer of the link intends (e.g., read-only access). This is known as the principle of least authority (POLA). Typically this is done by creating an unguessable bearer token that grants limited access to anybody that holds the token (the token bearer). There are two basic approaches to this:

  • The token on its own is sufficient to grant access. This is known as capability-based security. This is similar to how keys work in the physical world, with the token here acting as the key. Just as with a real key, you can have different keys (capabilities) for different doors (URLs) so that one key does not grant access to everything.
  • The token grants some subset of the permissions of the person/agent who authorised it. Access is granted based on the permissions granted by the token as well as the identity of the user that approved it (the resource owner). This is the model used by OAuth and UMA, and a real-world analogy would be a signed note from a teacher granting a child permission to go and fetch something from a store cupboard at school. The teacher’s signature and identity grant the authority here, and the note describes the scope of the permission being granted.

We will refer to both methods as capability URLs in this article, although strictly speaking that term only applies to the first option.

Why put tokens in URLs at all?

The OAuth approach is pretty popular these days as a way of granting limited access to third parties without giving away your username and password. The capability approach is less common, but has some attractive security properties. In both cases, if we want to follow POLA then we will end up with many fine-grained tokens that grant access to specific resources. We then have to make sure that we use the right token with the right URL, otherwise we might be denied access and also leak security tokens to unauthorized parties.

If we instead use broad-scope tokens and reuse them for many URLs then we run the risk of confused deputy attacks, where an attacker tricks us into adding our token to a request of their choosing. This is the basis of CSRF attacks against session tokens, and the same thing can happen with OAuth tokens if the code that adds access tokens to requests is not carefully written. Often UI code is written to add an access token to every request transparently, as a convenience for the developer. If an attacker can influence any of the requests you make, such as by manipulating URL parameters or the URL fragment, then they may still be able to trick your UI code into adding its token to a request of the attacker’s choosing.

Capability-based systems avoid confused deputy attacks because they fundamentally combine permission to act on something with the ability to name that thing. You can’t name an object you don’t have access to, and you can’t just add a permission token to any request to any resource. In an object-capability programming language, capabilities are references to objects; if you hold a reference to an object you can call the methods on it, and you can’t just create a reference out of nowhere. On the web, we can achieve some of the same advantages by combining unforgeable tokens with URLs. CSRF attacks become impossible as the attacker cannot construct a valid URL in the first place (think of this as a bit like anti-CSRF tokens directly in the URL itself). Phishing becomes pointless as there are no credentials to phish independent of the links they are part of. In certain cases, even XSS attacks become less effective, as we shall see later.

How to do it safely

Many security professionals reading this will baulk at the very idea of putting security tokens into URLs, and with good reason. A lot can go wrong. Let’s look at some ways to add credentials to URLs and how they can go wrong. Firstly, there are some issues that are common to all methods of putting credentials in URLs:

  • URLs may be saved in the history of a browser, making them potentially accessible to the next user of a public terminal. Most browsers support private browsing modes that can be used on public browsers to prevent this, but use of a public terminal is always a risk. If somebody gains access to your computer or phone, then this may also be a threat even on your own device. Using time-limited tokens in URLs can help mitigate against these threats.
  • URLs within page content may be cached by the browser and by web proxies. Use of TLS and proper Cache-Control headers can reduce this risk.
  • URLs in the content of a HTML page may be sniffed by other scripts running in the page. Sandboxing untrusted 3rd-party scripts in iframes can help, but a more general technique is to avoid links directly in HTML, as we shall see later.

Tokens in the path or in query parameters

https://example.com/users/Kay2xVuid_tiWxJ5NI0NWmGxQ7Y
https://example.com/users/bob?key=Kay2xVuid_tiWxJ5NI0NWmGxQ7Y

An obvious place to put a token in a URL is either into the path (perhaps using random names for resources) or into the query parameters. This is nice and simple, but suffers from some drawbacks.

Firstly, such URLs might inadvertently leak in the Referer header when loading resources from other sites. This can also happen via the document.referrer field available in Javascript when a third-party resource is loaded in an iframe (ironically, often done to isolate the 3rd-party code). These days the Referrer-Policy header can be used to prevent these kinds of leaks, but browser support is not yet 100%.

A second drawback is that URL path elements and query parameters are often logged in access logs of any servers that the request passes through, making tokens vulnerable to anybody who has access to those logs. The more widespread deployment of HTTPS has reduced this risk, but it still common to terminate TLS at a load balancer or reverse proxy (or even in the cloud before it ever reaches your own servers). The request may then pass through several other gateways and services before being processed (even more in this age of microservices), all of which might log request parameters.

Tokens in the fragment

One of the best discussions of URLs-as-capabilities was written by Tyler Close in 2007 as part of the documentation of the groundbreaking Waterken server. This in-depth analysis has largely stood the test of time, and seems remarkably prescient when read over a decade later. The conclusion reached then seemed entirely radical and almost unworkable to me when I first read it: the unguessable token should be in the URL fragment component (i.e., after the # character). The fragment has many advantages as it not sent to the server and never included in Referer headers.

A capability URL in this scheme looks like the following (example from the essay):

https://www.example.com/app/#mhbqcmmva5ja3

When followed, the browser will load https://www.example.com/app/, without sending the secret part in the fragment. As the URL that the server sees can be the same for many different security tokens, it can be cached and reused for many requests. Javascript in that page will then retrieve the token from window.location and make an Ajax call with the token as a query parameter to retrieve the actual content.

cdraw
The Waterken Web-Key flow with the token in the URL fragment. The template is cacheable.

A solution that didn’t work without Javascript seemed distasteful to me at the time, although Web 2.0 was already in full swing (maybe I was just behind the times). Nowadays, of course, nobody would bat an eyelid at this. This is the way that the vast majority of single-page apps work anyway. Far from being crazy and unworkable, this idea of loading a template and then making Ajax calls to load the actual content has become mainstream.

Non-browser clients, such as mobile apps or service-to-service REST applications, could know about such URLs and automatically send the token in the Authorization header without having to load some Javascript first.

Anybody who is interested in capability security on the web (and you should be) owes it to themselves to read that essay in full, as it has many great insights. Much of what I am writing in this article is really just a light update of that material. Still, in 2019 there are some drawbacks I would mention:

  • While the URL fragment is not sent to the server, if the page that is loaded issues a redirect that does not itself contain a fragment component, then the browser will append the fragment from the original request. This could result in the secret token being inadvertently shared with another site if you are not careful. For example, if the site being accessed also required login with Facebook, then it may redirect to Facebook’s login page, taking the capability token with it. When redirecting you should explicitly include a fragment part to ensure that any existing fragment is removed.
  • The Ajax requests to the server still includes the secret as a query parameter, allowing possible leakage in access logs. It also mingles the authorization token with any application parameters. In 2019, I would pass this as an Authorization: Bearer header just like an OAuth access token. It won’t get logged and everything will know it is a credential and treat it accordingly. (The Bearer auth scheme didn’t exist in 2007). If the backend server is on a different origin, then you can now use CORS to permit cross-origin requests with an Authorization header.
  • It’s hard for non-browser clients to distinguish capability URLs from other URLs that just happen to contain a fragment component, and they won’t know what to do when presented with a page full of Javascript.
  • I’d worry a bit about just 64-bit entropy for the unguessable secret part. While this makes for short URLs that are easily transcribed, and is perfectly fine if you have a small site and low request rate limits, in 2019 many sites issue way more than a million tokens (especially if this fine-grained) and accept far more than 1000 requests per second. As I have previously discussed, I prefer a higher entropy level for security tokens as a defence in depth. Most URLs are never seen by humans let alone manually transcribed. But if you do need human-friendly URLs, then you can probably get away with 64-bits and modest rate limiting.

Tokens in the userinfo component

An intriguing variation is to make use of the userinfo part of the URL that we discussed earlier when talking about HTTP Basic authentication. While the specific username:password form was deprecated, all browsers will still accept and parse a userinfo section and make it available to Javascript, even if they won’t honour the credentials when following the link. Browsers that do still support Basic/Digest auth for these type of links won’t actually send the credentials unless the server first prompts with a WWW-Authenticate header with an auth scheme that the browser supports. So to all intents and purposes, the userinfo component of a URL acts much like the fragment, and has some advantages:

  • Just like a fragment, the userinfo is never included in a Referer header nor in the document.referrer field when loading an iframe (even from the same origin).
  • The userinfo will never be carried over in a redirect as the fragment might be.
  • To discourage phishing, some browsers will not display the userinfo component to the user. This means that your links are not visually cluttered with ugly random strings, yet the links still copy and paste correctly.
  • The token does not interfere with other uses of the fragment. The fragment is often used for application purposes in single-page apps.
  • As the userinfo component is a standard place to put credentials, non-browser clients can easily determine which URLs contain credentials and which do not. Services like Github can easily see if credentials have been accidentally checked into a Git repository and, as the token is associated with a URL, could automatically inform the issuer (perhaps by discovering and calling a standard OAuth token revocation endpoint).

Technically, an implementation of capability URLs using the userinfo component would work nearly the same as for the fragment method used previously. Rather than retrieving the token from the fragment it would be retrieved from the userinfo and then submitted on an Ajax request as the Authorization header. Alas, this does not work for top-level links because most browsers will now completely strip the credentials, not just ignore them. They do this even if the userinfo part doesn’t contain a colon, which is a shame, as this would have been a great way to implement capability URLs.

The process would have worked as follows:

  1. The user clicks on a link like https://lL_tieqNCsQwkUAcxXa9XCsrHAg@example.com/abc
  2. The browser loads https://example.com/abc without sending any credentials (because the server didn’t prompt for any and browsers will ignore it anyway). A static cacheable template is loaded.
  3. Javascript in the loaded template extracts the token from the userinfo segment, for instance using window.location.username.
  4. The Javascript makes an Ajax call to the server passing the token in an Authorization: Bearer header.
cdraw-2
The capability URL flow with token in the userinfo component. Again, the template is cacheable.

So if you need to have capability URLs that are directly usable by end users, the best option currently is to follow the Waterken Web-Key approach and put the token in the URL fragment. If you control all the pages from which users will be clicking on links, then you can intercept the click events and rewrite them as form POSTs with the access token in the form body, as in this jQuery example:

$('a').on('click', function() {
    var username = this.username;
    this.username = ''; // Otherwise browser will ignore POST!
            
    var form = $('<form></form>');
    form.attr("method", "post");
    form.attr("action", this.href);
    var field = $('<input></input>');
    field.attr("type", "hidden");
    field.attr("name", "access_token");
    field.attr("value", username);
    form.append(field);
            
    $(document.body).append(form);
    form.submit();
});

Note that we need to clear the username field in the URL, otherwise most browsers will completely ignore the form post.

This approach only works if you can control all pages that link to your site, which is infeasible for usage on the public internet, bookmarks, etc. However, if you only need to use capability URLs in a REST API with programmatic clients, then the userinfo approach can work well.

Capability URLs and XSS

When used within a HTML page, capability URLs are vulnerable to XSS attacks. A malicious script that manages to run within the same frame will have full access to the HTML via the DOM, including access to URLs in any links, href, and src attributes. Use of iframes to sandbox 3rd party scripts can help, but a successful XSS attack will bypass such measures. Pretty much all client-side authorization techniques are vulnerable to XSS. For example, even cookies marked as HttpOnly are only a partial defence as the XSS script can just make requests directly from within the browser rather than trying to extract the cookie. However, I think we can do better with capability URLs.

Firstly, note that while the schemes described above can be achieved in Javascript today, in principle browsers could add direct support for the Bearer auth scheme when a token is in a URL, as it is much safer than Basic/Digest auth for all the reasons we’ve discussed. When a user clicks on a link with a token in the userinfo component, the browser could first do a request without any token. If the server responds with a 401 and WWW-Authenticate header, the browser would then repeat the request with the token in the Authorization header. As for Basic auth, the browser could then remember that this server/path accepts bearer tokens and so proactively send them on future requests. However, unlike Basic auth it would only send the token associated with a specific URL and not reuse a token for different requests. In that case they could also apply additional protections to such URLs. For instance, when such a URL appeared in an anchor or other tag the browser could strip the sensitive token when accessed from Javascript via the DOM, limiting the impact of XSS attacks, effectively making such URLs function like HttpOnly cookies. Or perhaps a httponly=”true” attribute on a/img/script/form tags could prevent all access to the URL from Javascript (except perhaps the ability to follow the link/submit the form).

We obviously cannot rely on hypothetical future browser support to help us build systems today. However, capability URLs have a distinct advantage over other authorization approaches because they do not need to store sensitive tokens in persistent storage, such as in cookies or in localStorage/sessionStorage. All the state needed to access a resource is encoded into the URL itself. This means that capability URLs can be held by the legitimate Javascript client in local variables, which are much harder for another script to get access to if properly encapsulated. Navigating away from a page will lose those local variables, but navigating back will itself be via a capability URL, allowing them to be recreated. There is still the risk of an XSS script access the initial capability URL via window.location, of course. A partial mitigation against this would be for the legitimate script to rewrite window.location to remove the security token after it has accessed it. As Javascript is not able to access the browser history, this would prevent any subsequent XSS from seeing the security token.

These are partial mitigations, and XSS is still a risk. However it is no more of a risk for capability URLs than for other web authorization technologies. It is therefore as always essential to avoid XSS vulnerabilities in the first place, through proper input validation, output escaping, and use of technologies such as Content-Security Policy. Until browsers move to a capability-secure model for Javascript execution, rather than relying on the pitiful same-origin policy, then XSS will always be a threat. Hey, a man can wish…

Where do capability URLs come from?

So we’ve seen how we can securely construct capability URLs in principle, and we’ve discussed the security pros and cons. But where do you get the URLs from in the first place? Managing hundreds of individual URLs sounds like a nightmare for the client. Sure, the first link into my application can be a capability URL but what about all the subsequent requests my app needs to make?

To solve this problem we need a shift in mindset. It’s a shift that many people have already made for reasons unrelated to security, and one with a catchy and memorable acronym: HATEOAS – Hypertext As The Engine Of Application State. I told you it was catchy. Many people, including Roy Fielding, believe this is fundamental to a proper REST architecture. The basic principle is that instead of the client knowing how to construct URLs to access specific functionality on the server, the server instead tells the client how to access these things by presenting it with hyperlinks. For example, if a client performs a search operation and gets back a page of results, rather than the client having to know to append ?page=2 to the query to get the next page, the server can send a link containing the full URL to access the next and previous pages as part of its response, either as a link within the content, or as a header.

From a capability point of view, a REST service that follows HATEOAS is a match made in heaven. Just as in an object-capability programming language, where access to one object will present the caller with access to other objects via references in fields and methods, so calling a REST endpoint via a capability URL can present the client with capability URLs to access related resources. One of the things that impressed me most when I first read about object-capability languages was how well the security discipline aligned with good software engineering discipline. The same is true on the web. By following good software engineering discipline to reduce coupling between client and server, you also make it easier to employ good security discipline through fine-grained capability URLs. Rather than being a pain to manage, use of capability URLs becomes completely natural and transparent for the client.

Beyond bearer tokens

If the idea of URLs as bearer credentials still scares you, then there are steps that can be taken to limit the ability of such URLs to be used by anyone. All such methods weaken the usability advantages of capability URLs (i.e., that sharing a link implicitly grants access), so should be carefully considered.

  • You can require a traditional login as well as a capability token to grant access. This can be used to limit sharing of capability URLs to within your own organisation, and also ensures that the user can be positively identified for audit logging purposes. Google Drive’s “shareable links” can work in this way. To access a resource a user would need to present the token associated with a capability URL and also prove their identity, for instance via a traditional session cookie. This is often a good default.
  • You can apply time-limit restrictions to a capability URL, or limit them to only be used from a certain origin, perhaps an IP range or a particular client domain.
  • You can apply proof-of-possession constraints to the token associated with a URL, such as binding it to a TLS client certificate, or to the current TLS channel context in a web browser with TLS token-binding.

If the tokens in your URLs are Macaroons, then you can apply such constraints on-the-fly by appending contextual caveats to the token. For example, before sending the token in an Authorization header, your client might append a caveat that says it is good for the next 5 seconds only, or it might bind it to the TLS channel at that moment. This ability to add contextual caveats means that the original URL can be left as a bearer token that can be easily shared, while the token that is sent over the network on API calls is much more locked-down.

Real-world examples

There are a few real-world examples of capability URLs (or something very like them) being used in the wild. The Waterken server we discussed before was intended for real-world use, but I’m not sure what applications were ever built with it. We’ve already mentioned Google Drive shareable links as an example. The links generated in that case put the unguessable token in the URL path component (it looks like ~256-bit URL-safe base64-encoded value):

https://docs.google.com/document/d/xZeWp7dseeITmFLIlw6ZFSqzZw6ZX-QgGSDOBj2rOgyl/edit

Another nice example is Dropbox’s Chooser and Saver APIs. These are an alternative way to provide limited access to a 3rd party app without granting broad access via OAuth. Instead when the app wants to read a file from Dropbox it pops up a Chooser window within Dropbox.

screenshot 2019-01-16 at 14.47.38
Dropbox Chooser in action

This Dropbox UI acts like a File Open dialog box and as it is running on Dropbox it can see all of your files. When you select one, Dropbox returns a unique URL that grants read-only (or write-only for Saver) access to that one file, as a capability URL. This is a classic example of capability-based security, and the ability for good UX to make it easy for users to allow fine-grained access to resources – in this case the user just picks a file, with no fuss or separate up-front consent page. As the file chooser is hosted by Dropbox the user gets a consistent UI for choosing files in all their apps. Again, Dropbox chose to put the unguessable portion directly in the URL path, this time with what seems to be an 72-bit base32-encoded token:

https://dl.dropboxusercontent.com/1/view/sinx9jdy0djp534/Public/Top%20Secret.txt

When the Chooser is returning direct access to a file, Dropbox also chose to limit the URL to only work for 4 hours as an additional safeguard, and to prevent applications using this as a way to permanently link to content in a user’s Dropbox (e.g., to display images).

I’d love to hear of any other examples of capability URLs being used in the real world, so please let me know in the comments.

Summary

It is often claimed that putting security credentials into a URL for easy sharing is very risky and should be discouraged. Hopefully in this essay I’ve managed to convince you that this isn’t always the case, and in fact capability URLs have some very nice security properties when done correctly. Hopefully we can revive the Waterken vision of a capability-secure web.

To summarise my recommendations for securely including credentials in a URL:

  • Always use a limited-scope token such as a capability token (key) or limited scope OAuth access token. Ideally the token should only provide access to the one resource named in the URL. Never ever ever put a username and password in a URL.
  • Prefer putting the unguessable token in the userinfo or fragment components of the URL as these are least likely to be accidentally leaked. Use Javascript to retrieve the token and submit to the server in an Authorization header using the Bearer auth scheme.
  • Set suitable Cache-Control and Referrer-Policy headers on resources protected with capability URLs.
  • Use HATEOAS and simple UX design principles to make use of capability URLs natural and secure.
  • Consider binding tokens to other contextual identifiers, such as session cookies or the TLS channel, but consider the usability implications of doing so.

Finally, this is an evolving area. New web security threats and defences emerge all the time, and I don’t claim to know every angle. I’d love to hear feedback from other security professionals, if you love the idea or hate it – leave comments below.

Author: Neil Madden

Founder of Illuminated Security, providing application security and cryptography training courses. Previously Security Architect at ForgeRock. Experienced software engineer with a PhD in computer science. Interested in application security, applied cryptography, logic programming and intelligent agents.

%d bloggers like this: