Flash Cookies and Privacy II

A detailed technical followup to Flash Cookies and Privacy II, describing the mechanisms behind Hulu/KISSmetrics’ respawning practices

cookiemonsterdeleteI thought I’d take the time to elaborate a bit further regarding the technical mechanisms described in our Flash Cookies and Privacy II paper that generated a bit of buzz recently. For a bit of background, I, along with Chris Hoofnagle and Nathan Good, had the honor of supervising Mika Ayenson and Dietrich J. Wambach in replicating our previous 2009 study which found that websites were circumventing user choice by deliberately restoring previously deleted HTTP cookies using persistent storage outside of the control of the browser (a practice we dubbed ‘respawning’).

In our follow up study, we found that Hulu was still respawning deleted user cookies using homegrown Flash and Javascript code present on the Hulu.com site. Additionally, Hulu, Spotify, and many others were also respawning using code provided by analytics firm KISSmetrics.* Hitten Shah, the founder of KISSmetrics, initially confirmed that the research surrounding respawning was correct in an interview with Ryan Singel although he later criticized the findings after a lawsuit was filed.

(*Hulu and KISSmetrics have both ceased respawning as of July 29th 2011)

Background

As much of this research is already discussed in our recent report, I advise readers to become familiar with that before proceeding. I’m providing this writeup as a brief technical addendum for those interested in the underlying mechanisms enabling respawning in the cases we observed.

SECTION 1 is a forensic writeup intended for a technical audience interested in how the respawning works.

SECTION 2 is directed towards policy makers and provides some musings about why these practices could be problematic to user privacy online.

SECTION 3 provides some additional data about what other sites may have been engaging in this practice including 31 sites other sites which I’ve confirmed were engaging in Cache-cookie syning using KISSmetrics prior to July 29th and 515 sites using still using KISSmetrics in a fashion that indicates they were likely also have been respawning until this functionality was disabled.

As a former technologist at the FTC Division of Privacy and Identity protection, I am interested in providing the most clear and accurate picture of these practices as possible. I have no particular interest in the lawsuits surrounding these companies nor have I ever taken money or provided any advance information that was not otherwise publicly available. (I list all of my funding sources in a disclosure statement on my website). This work was sponsored exclusively by the National Science Foundation’s Team for Research in Ubiquitous Secure Technology and I participated as an unpaid advisor to the two undergraduate students performing this work.

1. Technical Mechanisms

As described in the report, we observed two different cookie respawning mechanisms on Hulu.com. The first, enabled by Hulu.com itself, used Flash and HTML5 local storage to provide cookie backup and enable respawning of HTTP cookies. The second, enabled by 3rd party analytics firm KISSmetrics, used Flash, HTML5, and cache-based persistent storage of unique identifiers. It’s important to note that, while KISSmetrics is a 3rd party on Hulu.com, their code actually enables 1st party analytics on their customers’ domains.

Note: All the code linked to below (i.e., via pastebin links) were captured prior to July 29, 2011. The line numbers preceding the code snippets refer to sections in the pastebin links.

Hulu Respawning via Flash Shared Objects

When a user visits Hulu.com, the browser loads the necessary Flash files that are required to play back video on that site. It will first load a Flash file http://www.hulu.com/masthead.swf (source), which subsequently loads two more Flash files, http://www.hulu.com/guid.swf?v2 (source)and http://hulu.com/cram.swf (source). The latter two files are responsible for the actual setting of Flash Stored Object (FSO), more commonly known as a Flash cookie. Using a simple tool like this one, we can view the source code for the above files and identify the specific portion of the code responsible for setting the Flash cookie.

For example, the function below getComputerguid() (line 409 here) checks to see if there’s an HTTP cookie via (“Behaviors.getCookie”, “guid”), then pulls it from the Flash Stored Object(getComputerguidFromFSO();) and saves this value back into the HTTP cookie (“Behaviors.setCookie”, “guid”, _local2);. This is the practice we refer to as respawning HTTP cookies from Flash Stored Objects.

409. function getComputerguid() {
410. com.ns.utils.ConsoleLogger.getInstance().debug(“getComputerguid: Start”);
411. var _local2 = flash.external.ExternalInterface.call(“Behaviors.getCookie”, “guid”);
412. if (_local2 == undefined) {
413. _local2 = getComputerguidFromFSO();
414. if (_local2 != undefined) {
415. flash.external.ExternalInterface.call(“Behaviors.setCookie”, “guid”, _local2);
416. }
417. } else {
418. setComputerguidInFSO(_local2);
419. }
420. com.ns.utils.ConsoleLogger.getInstance().debug(“getComputerguid: Done”);
421. return(_local2);
422. }

The actual storage of the GUID is handled in line 396 of guid.swf, via the function (setComputerguidInFSO(value)) (i.e ‘Set the computer’s Globally Unique Identifier in the Flash Shared Object). It stores this UID from the HTTP cookie in the Flash Cookie file BeaconService.sol.

396. function setComputerguidInFSO(value) {
397. com.ns.utils.ConsoleLogger.getInstance().debug(“setComputerguidInFSO: Start”);
398. _globalFSO = SharedObject.getLocal(“BeaconService”, “/”);
399. _globalFSO.data.computerguid = value;
400. _globalFSO.flush();
401. com.ns.utils.ConsoleLogger.getInstance().debug(“setComputerguidInFSO: Done”);
402. }
HULU Respawning via HTML5 Localstorage

In addition to loading the Flash files above, the user’s browser will load Javascript code from http://static.huluim.com/system/hulu_107336_0722093143_1.js which provides the interface to synchronize cookies between HTTP, Flash, and HTML5 databases. I’ve highlighted the relevant parts of the Hulu script above showing (i.e., the ‘cram’ function), which upon loading, iterates through the various types of LocalStorage (i.e., HTML5 and Flash).

126. var order = [‘html5’, ‘flash’];
127. var store = null;
128. var self = {
129. load: function() {
130. for (var i = 0; i < order.length; i++) {
131. var method = methods[order[i]];
132. if (method.valid()) {
133. store = new method();
134. break;
135. }
136. }
137. document.fire(‘cram:load’);
138. },

Each time a method is instantiated (i.e., via new method()), the value passed is stored within that object (i.e., HTML5 or Flash). For example, Lines 43 create the HTML5 backend, Line 60 for the Internet Explorer UserData, and finally Line 85 handles the Flash container abstraction.

43. var html5 = backend.create(function() {
44. return window.localStorage && window.localStorage.getItem;60. var userData = backend.create(function() {
61. return !! window.ActiveXObject && msieVersion() >= 7.0;85. var flash = backend.create(function() {
86 return window.SWFObject;

The resulting HTML5 database is shown below with our GUID stored as the ‘ai’ cookie. (Note: The domain ‘www.hulu.com’ is stored in reverse character order in this DB):

moc.uluh.www.:http:80|_new_visitor_source|%22www.hulu.com%22|0|
moc.uluh.www.:http:80|ai|Z9iGGN1n1-zeVqbgzrlKkl39hiY|0|
moc.uluh.www.:http:80|uq||0|
KISSmetrics Respawning Via Flash

The KISSmetrics code that enables cookie respawning functions quite similarly to the Hulu example. Upon visiting a site that is using KISSmetrics, the following script (or one like it) will load:http://doug1izaerwt3.cloudfront.net/5a68d120b211c810289fc36493663648821d58aa.1.js.

Note: it appears that the actual link above is customized to the specific website using the service. The unique serial number (i.e., 5a68d120b211c810289fc36493663648821d58aa) is specific to Hulu in our example and contains code specific to the Hulu.com website. There is also a more readable cached copy of the script here from July 29, 2011.

The script above then loads the following Flash objecthttp://doug1izaerwt3.cloudfront.net/fs.swf (source) whose sole purpose is to read/set Flash Shared Objects. The only function of this Flash file is to set persistent storage but I’ve included relevant lines below.

9. public var storage:SharedObject;
23. storage = SharedObject.getLocal(key, “/”);
24. ExternalInterface.addCallback(“s”, setData);
25. ExternalInterface.addCallback(“g”, getData);

The KISSmetrics Javascript then handles the cookie storage/access via the functions (ifc, fgc, fsc). I’ve attempted to guess what the abbreviated function names might mean. (thanks to Jonathan Mayer for the hints). For example:

23. ifc #(initializeFlashCookie?)
49. fgc #(flashGetCookie?)
52. fsc #(flashSetCookie?)
KISSmetrics Respawning Via HTML5 LocalStorage

As with Hulu, HTML5 LocalStorage (and Internet Explorer userData) is used to store redundant copies of the user identifier and enable respawning. For example, in the same Javascript we can search for functions with abbreivated function names like ‘ils’, ‘lss’, and ‘lsq’ (or ‘iud’, ‘uds’, and ‘udg’ for Internet Explorer browsers). The relevant calls to LocalStorage and Flash Storage are immediately following the lines indicated below in the Javascript I link to above.

368. ils #(initializeLocalStorage?)
378. lss #(localStorageSet?)
390. lsg #(localStorageGet?)424. iud #(initializeUserData?)
437. uds #(userDataSet?)
450. udg #(userDataGet?)

Upon loading, the script checks to see if Flash is enabled (Line 43: KM.fl), loads the Flash object, then fetches the value of the ‘ai’ cookie (or uses the KMCID Javascript variable that is set by i.jsdescribed below). It then defines an ‘onload’ function (via Line 499: a.attachEvent(“onload”, KM.odr)) which initialize the local storage (lss) , (ils), and Flash storage (fsc).

The actual respawning (or cookie synchronization) happens via the high-level KM.gc and KM.sc functions (gc = getCookie, sc = setCookie). For example, the following function (Line 647: KM.gc) callsKM.gdc (Which I suspect means getDocumentCookie, another way to say HTTP cookie), then iterates through the various mechanisms (Line 647: “fsc”, “lss”, “uds”) for (flashSetCookie,localStorageSet, initializeUserData)to synchronize cookies and repopulate those that are not-set or have been deleted.

674. KM.cks = [“fsc”, “lss”, “uds”];
675. KM.ckg = [“fgc”, “lsg”, “udg”];
676. KM.gc = function(a, d) {
677. var c = KM.gdc(KM.cp + a);
678. var b;
679. if (!d) {
680. if (c) {
681. for (b = 0; b < KM.cks.length; b++) {
682. KM[KM.cks[b]](a, c)
683. }
684. return c
685. }
686. for (b = 0; b < KM.ckg.length; b++) {
687. if (c = KM[KM.ckg[b]](a)) {
688. break
689. }
690. }
691. if (c) {
692. KM.sc(a, c);
693. return c
694. }
695. } else {
696. if (c) {
697. return c
698. }
699. }
700. return KM.lc[a]
701. };
KISSmetrics Respawning Via Cache/ETags
Kissmetricscacheentry

FIGURE 1. Cached KISSmetric resource i.js. KMCID matches browser ETag and user cookie

What differentiates KISSmetrics apart from Hulu with regards to respawning is, in addition to Flash and HTML5 LocalStorage, KISSmetrics was exploiting the browser cache to store persistent identifiers via stored Javascript and ETags. ETags are tokens presented by a user’s browser to a remote webserver in order to determine whether a given resource (such as an image) has changed since the last time it was fetched. Rather than simply using it for version control, we found KISSmetrics returning ETag values that reliably matched the unique values in their ‘km_ai’ user cookies.

Specifically, when a user visits any KISSmetrics enabled website (such as Hulu.com) the following 3rd party script is also loaded: https://i.kissmetrics.com/i.js. Prior to July 29 2011, the sole function of i.js was to set a unique global identifier variable KMCID (the entire contents of the file shown here):

VAR KMCID='Z9iGGN1n1-zeVqbgzrlKkl39hiY'; if(typeof(_kmil) == 'function')_kmil();

When the user’s browser initially requested the KISSmetrics URL for the first time, a random value is generated as the KMCID Javascript variable along with a matching ETag header. This KMCID value then is stored in the users HTTP, HTML5, and Flash cookies concurrently.

Below we show an example request and reply to i.kissmetrics.com from a fresh browser with no prior cache and all cookies disabled (1st and 3rd party cookies blocked and ‘Private Browsing Mode’ enabled). Note that, even with cookies blocked, subsequent requests to KISSmetrics servers send the users unique cookie value in the ETag (‘If-None-Match’) header.

INITIAL REQUEST HEADER:
 GET /i.js HTTP/1.1
 Host: i.kissmetrics.com
INITIAL RESPONSE HEADER:
 Etag: "Z9iGGN1n1-zeVqbgzrlKkl39hiY"
 Expires: Sun, 12 Dec 2038 01:19:31 GMT
 Last-Modified: Wed, 27 Jul 2011 00:19:31 GMT
 Set-Cookie: _km_cid=Z9iGGN1n1-zeVqbgzrlKkl39hiY;
 expires=Sun, 12 Dec 2038 01:19:31 GMT;path=/;
SUBSEQUENT REQUEST HEADER (PRIVATE BROWSING MODE WITH ALL COOKIES BLOCKED):
 GET /i.js HTTP/1.1
 Host: i.kissmetrics.com
 If-None-Match: "Z9iGGN1n1-zeVqbgzrlKkl39hiY"

Finally, while the ETag is not strictly necessary since the cached object (i.js) that contains the unique KMCID identifier is set to persist until Dec 12, 2038, the ETag still provides redundancy and beacon-like functions even when cookies are blocked. That is, the transmission of the ETag header, along with the referer URL that is provided with each request, allows KISSmetrics to uniquely track an individual user across multiple domains, even with all cookies blocked and Javascript disabled. We highlight an example of this demonstrating persistent tracking across sites as a user browses from Gigaom to Spotify with Private Browsing enabled and all cookies (1st and 3rd) blocked.

2. Privacy Implications

LACK OF NOTICE

The privacy implications of these practices all center around awareness and user control. There is widespread policy consensus that individuals should be notified of website tracking mechanisms and given the ability to opt-out of the practice. However, we find that sites that engage in this practice do not disclose these practices to the consumer adequately. Prior to the release of paper, KISSmetrics did not even offer an opt-out nor statements about how it engaged in persistent tracking (in fact, they did not provide much notice to the consumer at all). See Figure 2 below.

Kissmetricsprivacypolicy

Figure 2. Snapshot of KISSmetrics privacy policy as of July 29, 2011.

(larger version available here)

LACK OF CHOICE

Individuals well versed in these matters have also had the technical option of deleting or blocking cookies and javascript. All of these control options–whether policy based or technology based–depend upon the consumer taking action to express their pro-privacy preferences. However, the practice of respawning demonstrates a lack of respect for consumer preferences. If consumers’ pro-privacy behaviors can be circumvented, consumers will be left in a technical arms race with companies that have very strong motivations to track users and hide their methods of doing so. In fact, the initial response to this practice from the KISSmetrics CEO was that users should install the AdBlock Plus browser extension if they do not wish to be tracked, something most consumers would not know to do this since these practices are not immediately visible. However, even the default filter settings in AdBlock does not actually block KISSmetrics. A user must be sophisticated enough discover the plugin, then seek out additional filters to mitigate these practices.

PERSISTENCE OF TRACKING

There are other, more immediate problems. For instance, because Adobe Flash Cookies are stored outside the browser, they operate outside the privacy protections of the browser. This means that one cannot protect herself by using different browsers for different activities (for instance, one for banking, and another for more vicarious web browsing). A Flash cookie acquired while using Firefox is also available to websites when using Internet Explorer.

Similarly, using private browsing mode does not fully insulate the user from tracking. Browser manufacturers are to be praised for their inclusion of private browsing modes but even they have difficulty keeping up with the incentives among advertisers to track individuals. KISSmetrics’ tracking tool is capable of session tracking even where the user has Private Browsing Mode enabled, and is blocking Flash, HTML5 and HTTP cookies. (See Figure 3) To prevent tracking across websites, the user would have to employ all of those pro-privacy steps, and clear the cache after each website visited.

Kissmetricspersistenttracking

Figure 3. Persistent tracking occurs between websites even with Privacy Browsing mode enabled and ALL cookies blocked (Flash, 1st Party, 3rd Party).

(larger version available here)

3RD PARTY LINKING AND ENHANCEMENT

Finally, websites often promise in their privacy policies that they will not sell data to third parties. However, the user should be concerned about a different practice–the website buying information from third parties. Let me explain: KISSmetrics’ system sets the SAME first-party cookie on every website it partners with, such as Hulu, Spotify, and Spokeo (See Figure 4). Since this unique identifier matches across all the websites that use KISSmetrics, websites could later collude and share/mix/buy information about a given user, even without KISSmetrics active involvement. For example, Hulu.com could approach data provider Spokeo to purchase additional data obtained about user ‘1234.’ Buying additional information about users is a very common practice, and is referred to as “enhancement” or “data appends” in the industry.

Kissmetricssharedcookies

Figure 4. Shared 1st Party ‘km_ai” cookies across major websites incl. Hulu, Spokeo, and GigaOm
(The above cookies all have the value indicated in the search field – GuTjB90-)

(larger version available here)

This is important because it breaks the trust model enabled by “selective revelation.” Advocates of market-based approaches to privacy have long argued that “privacy is all about trust.” Thus, the user “trusts” certain websites and shares only the amount of information that she is comfortable revealing in that context. For instance, a user may fear that Hulu.com would send spam, and thus provide a throw-away email address when signing up. That is a form of selective revelation that the market is supposed to respect.

However, if websites can simply go to information aggregators, selective revelation is no longer a workable strategy to protect privacy. Sharing any information–even fake information–could enable the website to match up cookies and discover real information that the user “trusted” to some other site. This risk is amplified where users are encouraged to authenticate in order to use a website’s services, such as music or video services like Spotify or Hulu.

UNAVOIDABLE 3RD PARTY TRACKING

As explained above, KISSmetrics uses the same identifier for consumers across the different websites it serves. In addition to data enhancement, this practice may be problematic because it enables KISSmetrics to uniquely track individuals across sites they visit. This makes KISSmetrics’ position more similar to a network advertiser than an analytics provider.

The CEO has clarified in his response that KISSmetrics’ same-identifier-across-all-sites design is benign. He states “internally, these identifiers are instantly translated into unique identifiers for each customer, and KISSmetrics has gone to extensive lengths to avoid linking any information from different customers”. However, as I show in Figure 5 below, unless all logfiles are instantaneously also deleted or trunctated, KISSmetrics is likely to receive and store linkable information about users’ browsing activity across all the sites that use them even if the user has blocked all cookies.

I don’t have visibility into KISSmetrics backend systems and therefore unable to speak conclusively as to their internal practices with regards to ‘instant translation of unique identifiers’ or logfile retention. However, since the unique identifiers are included the actual URL and not the cookie headers (i.e the ‘_p=Y0Lvd5WMVDkIXW-K12Mn8EM8H2o’), I can observe their transmission to KISSmetrics servers and suspect each will generate a log entry on their systems. Unless all log data is immediately deleted or truncated, it’s likely that this cross-domain browsing history is available on their systems, unhashed. I invite KISSmetrics to explain how they avoid this problem if this is incorrect.

———————————————————-

http://trk.kissmetrics.com/e?URL=http%3A%2F%2Fwww.hulu.com%2F&Referrer=Direct&_n=Visited%20Site&_k=5a68d120b211c810289fc36493663648821d58aa&_p=Y0Lvd5WMVDkIXW-K12Mn8EM8H2o&_t=1311921533
Host: trk.kissmetrics.com
DNT: 1
Connection: keep-alive
Referer: http://www.hulu.com/
----------------------------------------------------------
http://trk.kissmetrics.com/e?_n=Viewed%20Home%20Page&_k=5a68d120b211c810289fc36493663648821d58aa&_p=Y0Lvd5WMVDkIXW-K12Mn8EM8H2o&_t=1311921533
 Host: trk.kissmetrics.com
 DNT: 1
 Connection: keep-alive
 Referer: http://www.hulu.com/
----------------------------------------------------------
http://trk.kissmetrics.com/e?Country=US&_n=InClient%20USA&_k=7a62f3d25e724160c335d397cd7fabfaecc58b19&_p=Y0Lvd5WMVDkIXW-K12Mn8EM8H2o&_t=1311921553
 Host: trk.kissmetrics.com
 DNT: 1
 Connection: keep-alive
Referer: http://www.spotify.com/us/hello-america/

Figure 5. Persistent Identifer and Referrer sent to KISSmetrics across domains (Hulu/Spotify)
even with all Private Browsing mode enabled and all cookie blocked (1st, 3rd, Flash).

3. Prevalence

Finally, I reached out to a few other researchers in this space to see if they had additional data regarding the prevalence of ‘respawning’ on the web. I was fortunate to come across the following data sets, including two data sets that were recorded prior to the publication.

  • Jonathan Mayer and the Stanford CIS group just released an amazing research tool and example dataset dubbed ‘Fourthparty’ which allows easy measurement of web content. Using their July 21-23, 2011 crawl of the Alexa Top 10,000 (331MB zip) websites, we can identify 31 sites which include the KISSmetrics ‘http://i.kissmetrics.com/i.js’ script that enabled the cache based cookie respawning described above. Using this dataset, we can also CONFIRM that every site listed had matching ETag and Cookie headers showing that, at the time of the crawl, these sites were likely engaged in this respawning practice.
  • Samy Kamkar, the author of Evercookie, independently confirmed Flash and HTML5 based respawning on Hulu.com as well as 39 other sites prior to the release of our paper. He has a great writeup on his findings that he released here (though no mention of Cache/ETags).
  • The privacy conscious folks at Abine were kind enough to run a query across their dataset and were able to identify 53 sites using KISSmetrics during prior to July 31, 2011.
  • BuiltWith is a site that tracks ‘Web and Internet Technology Usage Statistics’. As of this writing, they show 424 websites using KISSmetrics. However, their ‘trends’ chart shows a 70% decline in the use of KISSmetrics in the Top 10,000 websites (.14 – .04) since July 2011 so it’s not readily apparent what the current penetration is.
  • Finally, Nick Doty and I ran an automated crawl between July 30th-August 2nd on the Alexa Top 1,000,000 and identified 515 websites which were using http://i.kissmetrics.com/i.js, including one ‘.gov’ domain (http://challenge.gov). Note: since KISSmetrics updated their practices on July 29th, we weren’t able to confirm that these sites were still respawning. However, since these sites all reference http://i.kissmetrics.com/i.js whose sole function was to set an persistent identifier via the browser cache, it’s likely this was happening, with or without the sites explicit knowledge.
Corrections?

As a final note, I’d like to invite Hulu, KISSmetrics, or any other site discussed in this post to provide corrections if any of the above statements are incorrect in any way. This writeup is based on external analysis of these platforms without any help of the companies involved so if there’s more to it, I’d love to help clarify. Also happy to provide additional screenshots, logfiles, and raw forensic captures to those interested.