Secure Cross-Domain Communication in the Browser

The Architecture Journal

by Danny Thorpe

Summary: A shopper can walk into virtually any store and make a purchase with nothing more than a plastic card and photo ID. The shopper and the shopkeeper need not share the same currency, nationality, or language. What they do share is a global communications system and global banking network that allows the shopper to bring their bank services with them wherever they go and provides infrastructure support to the shopkeeper. What if the Internet could provide similar protections and services for Web surfers and site keepers to share information? (9 printed pages)

Contents

IFrame URL Technique
Hiding Data in Bookmarks
Sender Identification
Sending to the Sender
Stateful Receiver
Application of Ideas
User Empowerment
Acknowledgments
Resources

 

Developing applications that live inside the Web browser is a lot like window shopping on Main Street: lots of stores to choose from, lots of wonderful things to look at in the windows of each store, but you can't get to any of it. Your cruel stepmother, Frau Browser, yanks your leash every time you lean too close to the glass. She says it's for your own good, but you're beginning to wonder if your short leash is more for her convenience than your safety.

Web browsers isolate pages living in different domains to prevent them from peeking at each other's notes about the end user. In the early days of the Internet, this isolation model was fine because few sites placed significant application logic in the browser client, and even those that did were only accessing data from their own server. Each Web server was its own silo, containing only HTML links to content outside itself.

That's not the Internet today. The Internet experience has evolved into aggregating data from multiple domains. This aggregation is driven by user customization of sites as well as sites that add value by bringing together combinations of diverse data sources. In this world, the Web browser's domain isolation model becomes an enormous obstacle hindering client-side Web application development. To avoid this obstacle, Web app designers have been moving more and more application logic to their Web servers, sacrificing server scalability just to get things done. Meanwhile, the end user's 2GHz, 2GB dumb terminal sits idle.

If personal computers were built like a Web browser, you could save your data to disk, but you couldn't use those files with any other application on your machine, or anyone else's machine. If you decided to switch to a different brand of photo editor, you wouldn't be able to edit any of your old photos. If you complained to the makers of your old photo editor, they would sniff and declare "We don't know what that other photo editor might do with your data. Since we don't know or trust that other photo editor, then neither should you! And no, we won't let you use 'your' photos with them, because since we're providing the storage space for those photos, they're really partly our photos."

You couldn't even find your files unless you knew first which application you created them with. "Which photo editor did I use for Stevie's birthday photos? I can't find them!"

And what happens when that tragically hip avant-garde photo editor goes belly up, never to be seen again? It takes all your photos with it!

Sound familiar? It happens to all of us every day using Internet Web sites and Web applications. Domain isolation prevents you from using your music playlists to shop for similar tunes at an independent online store (unrelated to your music player manufacturer) or at a kiosk within a retail store.

Domain isolation also makes it very difficult to build lightweight low-infrastructure Web applications that slice and dice data drawn from diverse data servers within a corporate network. A foo.bar.com subdomain on your internal bar.com corpnet is just as isolated from bar.com and bee.bar.com as it is from external addresses like xyz.com.

Nevertheless, you don't want to just tear down all the walls and pass around posies. The threats to data and personal security that the browser's strict domain isolation policy protects against are real, and nasty. With careful consideration and infrastructure, there can be a happy medium that provides greater benefit to the user while still maintaining the necessary security practices. Users should be in control of when, what, and how much of their information is available to a given Web site. The objective here is not free flow of information in all directions, but freedom for users to use their data where and when it serves their purposes, regardless of where their data resides.

What is needed is a way for the browser to support legitimate cross-domain data access without compromising end user safety and control of their data.

One major step in that direction is the developing standards proposal organized by Ian Hickson to extend xmlHttpRequest to support cross-domain connections using domain-based opt-in/opt-out by the server being requested. (See Resources.) If this survives peer review and if it is implemented by the major browsers, it offers hope of diminishing the cross-domain barrier for legitimate uses, while still protecting against illegitimate uses. Realistically, though, it will be years before this proposal is implemented by the major browsers and ubiquitous in the field.

What can be done now? There are patterns of behavior supported by all the browsers which allow JavaScript code living in one browser domain context to observe changes made by JavaScript living in another domain context within the same browser instance. For example, changes made to the width or height property of an iframe are observable inside as well as outside the iframe. Another example is the iframe.src property. Code outside an iframe cannot read the iframe's src URL property, but it can write to the iframe's src URL. Thus, code outside the iframe can send data into the iframe via the iframe's URL.

This URL technique has been used by Web designers since iframes were first introduced into HTML, but uses are typically primitive, purpose-built, and hastily thrown together. What's worse, passing data through the iframe src URL can create an exploit vector, allowing malicious code to corrupt your Web application state by throwing garbage at your iframe. Any code in any context in the browser can write to the iframe's .src property, and the receiving iframe has no idea where the URL data came from. In most situations, data of unknown origin should never be trusted.

This article will explore the issues and solution techniques of the secure client-side cross-domain data channel developed by the Windows Live Developer Platform group.

IFrame URL Technique

An iframe is an HTML element that encapsulates and displays an entire HTML document inside itself, allowing you to display one HTML document inside another. We'll call the iframe's parent the outer page or host page, and the iframe's content the inner page. The iframe's inside page is specified by assigning a URL to the iframe's src property.

When the iframe's source URL has the same domain name as the outer, host page, JavaScript in the host page can navigate through the iframe's interior DOM and see all of its contents. Conversely, the iframe can navigate up through its parent chain and see all of its DOM siblings in the host page and their properties. However, when the iframe's source URL has a domain different from the host page, the host cannot see the iframe's contents, and the iframe cannot see the host page's contents.

Even though the host cannot read the iframe element's src property, it can still write to it. The host page doesn't know what the iframe is currently displaying, but it can force the iframe to display something else.

Each time a new URL is assigned to the iframe's src property, the iframe will go through all the normal steps of loading a page, including firing the onLoad event.

We now have all the pieces required to pass data from the host to the iframe on the URL. (See Figure 1.) The host page in domain foo.com can place a URL-encoded data packet on the end of an existing document URL in the bar.com domain. The data can be carried in the URL as a query parameter using the ? character (https://bar.com/receiver.html?datadatadata) or as a bookmark using the # character (https://bar.com/receiver.html\#datadatadata). There's a big difference between these two URL types which we'll explore in a moment.

Figure 1. iframe URL data passing

Figure 1. iframe URL data passing

The host page assigns this URL to the iframe's src property. The iframe loads the page and fires the page's onLoad event handler. The iframe page's onLoad event handler can look at its own URL, find the embedded data packet, and decode it to decide what to do next.

That's the iframe URL data passing technique at its simplest. The host builds a URL string from a known document url + data payload, assigns it to the src property of the iframe, the iframe "wakes up" in the onLoad event handler and receives the data payload. What more could you ask for?

A lot more, actually. There are many caveats with this simple technique:

·         No acknowledgement of receipt—The host page has no idea if the iframe successfully received the data.

·         Message overwrites—The host doesn't know when the iframe has finished processing the previous message, so it doesn't know when it's safe to send the next message.

·         Capacity limits—A URL can be only so long, and the length limit varies by browser family. Firefox supports URLs as long as 40k or so, but IE sets the limit at less than 4k. Anything longer than that will be truncated or ignored.

·         Data has unknown origin—The iframe has no idea who put the data into its URL. The data might be from our friendly foo.com host page, or it might be evil.com lobbing spitballs at bar.com hoping something will stick or blow up.

·         No replies—There's no way for script in the iframe to pass data back to the host page.

·         Loss of context—Because the page is reloaded with every message, the iframe inner page cannot maintain global state across messages

Hiding Data in Bookmarks

Should we use ? or # to tack data onto the end of the iframe URL? Though innocuous enough on the surface, there are actually a few significant differences in how the browsers handle URLs with query params versus URLs with bookmarks. Two URLs with the same base path but different query params are treated as different URLs. They will appear separately in the browser history list, will be separate entries in the browser page cache, and will generate separate network requests across the wire.

URL bookmarks were designed to refer to specially marked anchor tags within a page. The browser considers two URLs with the same base path but with different bookmark text after the # char to be the same URL as far as browser history and caches are concerned. The different bookmarks are just pointing to different parts of the same page (URL), but it's the same page nonetheless.

The URLs https://bar.com/page.html\#one, https://bar.com/page.html\#two, and https://bar.com/page.html\#three are considered by the browser to be cache-equivalent to https://bar.com/page.html. If we used query params, the browser would see three different URLs and three different trips across the network wire. Using bookmarks, however, we have at most one trip across the network wire; subsequent requests will be filled from the local browser cache. (See Figure 2.)

Click here for larger image

Figure 2. Cache equivalence of bookmark URLs (Click on the picture for a larger image)

For cases where we need to send a lot of messages across the iframe URL using the same base URL, bookmarks are perfect. The data payloads in the bookmark portion of the URL will not appear in the browser history or browser page cache. What's more, the data payloads will never cross the network wire after the initial page load is cached!

The data passed between the host page and the iframe cannot be viewed by any other DOM elements on the host page because the iframe is in a different domain context from the host page. The data doesn't appear in the browser cache, and the data doesn't cross the network wire, so it's fair to say that the data packets are observable only by the receiving iframe or other pages served from the bar.com domain.

Sender Identification

Perhaps the biggest security problem with the simple iframe URL data-passing technique is not knowing with confidence where the data came from. Embedding the name of the sender or some form of application ID is no solution, as those can be easily copied by impersonators. What is needed is a way for a message to implicitly identify the sender in such a way that could not be easily copied.

The first solution that pops to mind for most people is to use some form of encryption using keys that only the sender and receiver possess. This would certainly do the job, but it's a rather heavy-handed solution, particularly when JavaScript is involved.

There is another way, which takes advantage of the critical importance of domain name identity in the browser environment. If I can send a secret message to you using your domain name, and I later receive that secret as part of a data packet, I can reasonably deduce that the data packet came from your domain.

The only way for the secret to come from a third-party domain is if your domain has been compromised, the user's browser has been compromised, or my DNS has been compromised. All bets are off if your domain or your browser have been compromised. If DNS poisoning is a real concern, you can use https to validate that the server answering requests for a given domain name is in fact the legitimate server.

If the sender gives a secret to the receiver, and the receiver gives a secret to the sender, and both secrets are carried in every data packet sent across the iframe URL data channel, then both parties can have confidence in the origin of every message. Spitballs thrown in by evil.com can be easily recognized and discarded. This exchange of secrets is inspired by the SSL/https three-phase handshake.

These secrets do not need to be complex or encrypted, since the data packets sent through the iframe URL data channel are not visible to any third party. Random numbers are sufficient as secrets, with one caveat: The JavaScript random-number generator (Math.random()) is not cryptographically strong, so it is a risk for producing predictable number sequences. Firefox provides a cryptographically strong random-number generator (crypto.random()), but IE does not. As a result, in our implementation we opted to generate strong random numbers on the Web server and send them down to the client as needed.

Sending to the Sender

Most of the problems associated with the iframe URL data passing technique boil down to reply generation. Acknowledging packets requires the receiver to send a reply to the sender. Exchanging secrets requires replies in both directions. Message throttling and breaking large data payloads into multiple smaller messages require receipt acknowledgement.

Click here for larger image

Figure 3. Message in a Klein Bottle (Click on the picture for a larger image)

So, how can the iframe communicate back up to the host page? Not by going up, but by going down. The iframe can't assign to anything in its parent because the iframe and the parent reside in different domain contexts. But the bar.com iframe (A) can contain another iframe (B) and A can assign to B's src property a URL in the domain of the host page (foo.com). foo.com host page contains bar.com iframe (A) contains foo.com iframe (B).

Great, but what can that inner iframe do? It can't do much with its parent, the bar.com iframe. But go one more level up and you hit pay dirt: B's parent's parent is the host page in foo.com. B's page is in foo.com, B.parent.parent is in foo.com, so B can access everything in the host page and call JavaScript functions in the host page's context.

The host page can pass data to iframe A by writing a URL to A's src property. A can process the data, and send an acknowledgement to the host by writing a URL to B's src property. B wakes up in its onLoad event and passes the message up to its parent's parent, the host page. Voilà. Round-trip acknowledgement from a series of one-way pipes connected together in a manner that would probably amuse Felix Klein, mathematician and bottle washer.

Stateful Receiver

To maintain global state in the bar.com context across multiple messages sent to the iframe, use two iframes with bar.com pages. Use one of the iframes as a stateless message receiver, reloading and losing its state with every message received. Place the stateful application logic for the bar.com side of the house in the other iframe. Reduce the messenger iframe page logic to the bare minimum required to pass the received data to the stateful bar.com iframe.

An iframe cannot enumerate the children of its parent to find other bar.com siblings, but it can look up a sibling iframe using window.parent.frames[] if it knows the name of the sibling iframe. Each time it reloads to receive new data on the URL, the messenger iframe can look up its stateful bar.com sibling iframe using window.parent.frames[] and call a function on the stateful iframe to pass the new message data into the stateful iframe. Thus, the bar.com domain context in browser memory can accumulate message chunks across multiple messages to reconstruct a data payload larger than the browser's maximum URL length.

Application of Ideas

The Windows Live Developer Platform team has developed these ideas into a JavaScript "channel" library. These cross-domain channels are used in the implementation of the Windows Live Contacts and Windows Live Spaces Web controls (https://dev.live.com), intended to reside on third party Web pages but execute in a secure iframe in the live.com domain context. The controls provide third party sites with user-controlled access to their Windows Live data such as the user's contacts list or Spaces photo albums. The channel object supports sending arbitrarily large data across iframe domain boundaries with receipt acknowledgement, message throttling, message chunking, and sender identification all taking place under the hood.

Our goal is to groom this channel code into a reusable library, available to internal Microsoft partners as well as third party Web developers. While the code is running well in its current contexts, we still have some work to do in the area of self-diagnostics and troubleshooting; when you get the channel endpoints configured correctly, it works great, but it can be a real nightmare to figure out what isn't quite right when you're trying to get it set up the first time. The main obstacle is the browser itself—trying to see what's (not) happening in different domain contexts is a bit of a challenge when the browser won't show you what's on the other side of the wall.

User Empowerment

Hardly 40 years ago, a shopper on Main Street USA had to go to considerable effort to convince a shopkeeper to accept payment. If you didn't have cash (and lots of it), you were most likely out of luck. If you had foreign currency, you'd need to find a big bank in a big city to exchange for local currency. Checks from out of town were rarely accepted, and store credit was offered only to local residents.

Today, shoppers and shopkeepers share a global communications system and global banking network that allows the shopper to bring their bank services with them wherever they go, and helps the shopkeeper make sales they otherwise might miss. The banking network also provides infrastructure support to the shopkeeper, helping with currency conversion, shielding from credit risk and reducing losses due to fraud.

Now, why can't the Internet provide similar protections and empowerments for the wandering Web surfer, and infrastructure services for Web site keepers? Bring your data and experience with you as you move from site to site (the way a charge card brings your banking services with you as you shop), releasing information to the site keepers only at your discretion. The Internet is going to get there; it's just a matter of how well and how soon.

Acknowledgments

Kudos to Scott Isaacs for the original iframe URL data-passing concept. Many thanks to Yaron Goland and Bill Zissimopoulos for their considerable contributions to the early implementations and debugging of the channel code, and to Gabriel Corverra and Koji Kato for their work in the more recent iterations. "It's absolute insanity, but it just might work!"

Resources

XMLHttpRequest 2, Ian Hickson

https://www.mail-archive.com/public-webapi@w3.org/msg00341.html

https://lists.w3.org/Archives/Public/public-webapi/2006Jun/0012.html

Anne van Kesteren's Weblog

https://annevankesteren.nl/2007/02/xxx

About the author

Danny Thorpe is a developer on the Windows Live Developer Platform team. His badge says "Principal SDE," but he prefers "Windows Live Quantum Mechanic," as he spends much of his time coaxing stubborn little bits to migrate across impenetrable barriers. In past lives, he worked on "undisclosed browser technology" at Google, and, before that, he was a Chief Scientist at Borland and Chief Architect of the Delphi compiler. At Borland, he had the good fortune to work under the mentorship of Anders Hejlsberg, Chuck Jazdzewski, Eli Boling, and many other Borland legends. Prior to joining Borland, he was too young to remember much. Check out his blog at https://blogs.msdn.com/dthorpe.

 

This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.