In June 2009, I published a landmark paper with two colleagues at the UC Berkeley School of Information examining the common practices among website operators of collecting, sharing and analyzing users’ data. We compared industry practices with users’ expectations of privacy, identified points of divergence, and made recommendations for changes in industry practice and government regulation.
The goal of KnowPrivacy was to influence policy governing data collection and sharing practices employed by popular Internet sites. We identified deceptive practices that may be harmful to users’ privacy. The team not only published a paper, but also built a website that illustrated the prevalence of web tracking software among the most visited websites.
We assessed users’ perceptions, expectations and knowledge based on three sources. First, we gathered data from surveys of public opinions found in previous research done by various public policy and polling organizations. Next, we analyzed which practices upset them enough to file complaints with privacy watchdog organizations such as the FTC, the Privacy Rights Clearinghouse, the California Office of Privacy Protection, and TRUSTe. Finally, we looked at popular media to get a sense of what was discussed in articles about internet privacy. We used this content as a proxy for what issues users are aware of and what they may not know.
From these various sources of data we identified points of conflict between the privacy expectations of Internet users and the actual practices of website operators.
User Expectations and Knowledge
- Users are concerned about data collection online and want greater control over their personal information.
- Users lack awareness of some data collection practices.
- Users don’t know where to file their complaints.
- Websites collect and analyze data about users, but only offer partial access and control to the users.
- Website policies are unclear about important issues, such as retention and data enhancement.
- Websites claim they do not share user data with third parties, however, they share with affiliates with whom users often have no relationship.
- Web bug trackers are ubiquitous. Analytics and ad serving companies can track user behavior across large portions of the web.
Web bugs: prevalent
The paper and corresponding website at knowprivacy.org include data and charts that highlight the nearly ubiquitous presence of web bugs – pieces of code that track, monitor and report web usage back to website owners. Web bugs enable third parties to place cookies on a user‘s browser and track the user‘s navigation across the web.
Web bugs are typically a small graphic embedded in the page, usually an invisible 1-by-1 pixel, and are also called web beacons, clear GIFs, or pixel tags. Ad networks can use web bugs to aggregate information and create a profile of what sites a person has visited. The personal profile of a user is identified by the browser cookie of an ad network, allowing the network to track behavior across sites and over time.
Because these web bugs are invisible, users are unlikely to notice them and cannot be relied upon to regulate this practice. In fact, there are few effective controls for this tracking technology. When this research was conducted, all of the top 50 websites contained at least one web bug in a one month time period. Some had as many as 100.
Of greater note was the depth of coverage that some tracking companies have. Several of the tracking companies had a web bug on the majority of the top 100 sites. Google, in particular, had extensive coverage – it had a web bug on 92 of the top 100 sites, and on 88% of the total domains reported in the data set of almost 400,000 unique domains.
What data is used for
Website operators use information about user behavior for various purposes. They can use the data for the development and improvement of the website, making it easier to use. They can customize a site to fit individual users‘ tastes. An e-commerce site can make product recommendations based on previous purchases or they can use the information to deliver targeted ads. Many of these uses benefit the visitors to the site and are actively sought by consumers.
Sometimes site operators will rent or sell personal and behavioral data about users to third parties. More often, the operators will share the data with marketing partners or corporate affiliates and subsidiaries. This means that user behavior may be profiled not only by sites visited by a user, but also by any other entities with whom those sites share this information.
The KnowPrivacy project built site profiles for the top fifty most visited sites. Each site’s profile provides:
- types of data collected from users
- general data collection practices
- data sharing practices
- the number of web bugs found on the site in March 2009
Here is Google’s profile, based on our March 2009 findings. Click here to see additional profiles and rollover information.
The problems with web bugs: pervasive yet invisible
Our analysis of user expectations found that users are concerned about data collection and want greater control over the process, but that they only voice their concerns when they perceive an invasion of privacy. Because web bugs are essentially invisible to users, they are not perceived as a threat, despite the fact that users have little control over the data collection by web bugs.
The KnowPrivacy team recommended that the practice of third-party tracking be made more transparent. It currently operates in a policy loophole, by which neither the website nor the tracker are clearly accountable for the data collected. We recommended that websites define the policies of the third party trackers it allows on its site, or at a minimum, link to the appropriate policies on the tracking companies’ websites and specify which practices fall under each policy.
We recommended regulation by which third-party trackers must allow users to see all the data that has been collected about them.
The presence and purpose of third party tracking should also be made more salient in the minds of users. The team recommended that all browser developers provide a Ghostery-like function in their browsers that alerts users to the presence of third-party trackers.
What happened after publication
Following this paper’s release, The Wall Street Journal asked me to conduct additional data research for articles they were writing related to consumers and web privacy. What started as a one-time request for data from the paper turned into a series of articles called What They Know, published from 2010 to 2012.
Additionally, federal lawmakers took notice. The Federal Trade Commission (FTC) referenced KnowPrivacy’s findings in an agency report, Protecting Consumer Privacy in an Era of Rapid Change: A Proposed Framework for Businesses and Policymakers.