I wrote this essay for a conference hosted by PEN America on the chilling effects of surveillance. I was asked to address what questions researchers should focus on and I discussed the threat posed by stored data and the opportunity for researchers to create new transparency tools. It was originally published here, but you can also read it below!
How do we protect something we can barely see?
As much time as we spend discussing privacy, you would think it’d be easy to define. Yet the more we discuss it, the more it becomes apparent that our definitions of privacy vary widely. For some it means keeping only their deepest secrets safe, while for others any information collected about them without their consent is perceived as a violation. Despite these inconsistencies, most definitions of privacy depend on knowing and controlling what information is collected about us.
Most of the time users don’t realize how much information they are sharing, how it’s stored, or who has access to it. In the analog world, controlling one’s own information was relatively straightforward. Obvious physical and cost barriers limit how quickly and how far information about an individual can be shared. Its reach was our personal circle of friends or maybe a wider community if there were a diligent town gossip. But technology has expanded the reach of information significantly. Now, there are vast quantities of data collected about individual users daily, often stored indefinitely in data centers operated by private companies, and available to anyone that is granted (or can forcefully obtain) access. Understanding what kind of data is collected, how it is stored, and who has access are critical components of managing our privacy and avoiding harm. Researchers can play a major role in this process by developing new tools to help increase transparency and create a safer environment for all users.
In the same way that technology can be used to develop new forms of tracking, technology can also provide new avenues for transparency. Scientists have historically developed tools to help them explore difficult-to-observe phenomena. Long ago, the microscope allowed people to observe what was not obvious to the naked eye—a world of small structures that built the world around them and the organisms that can sometimes threaten them. This increased understanding of the underlying components generated new ways to respond to and mitigate the harms. Privacy researchers need tools that mimic the capability of the microscope—something that allows us to identify the invisible traces of information that we emit every day.
Transparency in the collection and storage of data is important, but insights into how data is used are even more critical. Most users don’t understand the value of their digital trail; however, the companies who keep databases for commercial purposes are keenly aware of ways they can exploit it. Many of us have a sense that, more and more, every move we make is tracked, both online and off: cloud services mine our email to serve us tailored ads, Staples changes the price of items based on the IP address of the shopper, and some brick-and-mortar stores even use the Wi-Fi signal on our phones to observe our movements in the store. In fact, the bulk of the value is extracted through sophisticated aggregation and mining of users’ information that would be hard for consumers to fathom. Researchers need to help document the not just the collection, but the use of their information to help us understand the harm.
Right now there are very few legal limits on how much data companies can collect, how they secure it, or for how long it’s stored. As a result, data about users is often stored indefinitely on unsecured systems distributed all over the world. The storage of this data over long periods of time is inherently risky and potentially toxic for consumers. Much like nuclear waste, the longer the data is held, the greater the likelihood that some of it will leak out to an unintended recipient. Additionally, the companies storing data face few, if any, repercussions from breaches, leaving no tangible incentive to secure information. The inevitability that data will leak to other, less-savory actors is not obvious to users. Even among privacy experts, who have decently reliable tools that allow them to observe leaks, the impact is often not salient because informational harms may not happen immediately or directly. Just knowing what information was leaked usually won’t be enough to intuitively understand the potential risk. We also need to come up with ways to make this information salient by explaining the potential uses of data in a way that individuals will care about.
Improving the state of privacy is even more complicated now that we have begun to understand the extent of the government’s surveillance capabilities. A year ago, many might have considered commercial privacy issues to be separate from government surveillance; however, more and more it is becoming clear that these two things are deeply interconnected. We now know that a significant portion of the information government intelligence agencies collect comes from commercial cell carriers, email providers, and a number of other digital service providers that comprise the digital ecosystems. Many of the advances in private industry are also leveraged by the government; not only in terms of access, but also by mimicking the data mining techniques of private industry. Any advances made in government privacy regulation have been undermined by the technical advances in accessing data stored by private companies. Some companies have started to push back on government demands or have started offering competing services with better privacy protections for users, such as encrypted email. Yet most users don’t have the sophistication required to adequately evaluate the effectiveness of these measures. Insight and analysis from experts about the privacy and security features of commercial entities will enable market competition by informing users and ultimately encourage faster development of privacy-protecting technologies.
Right now, data collection mechanisms are invisible to most users—only the companies involved and governments (and a handful of sophisticated users) have a good understanding of how digital information is used. Also, as there is no regulatory structure, the same people who understand the power of massive information flows are typically the ones setting the policies on how this data can be used. These actors also have the incentive to innovate new ways to monetize personal information, and researchers have to work hard to keep up with their technological advancements. Because privacy has so many different meanings to different people, it’s hard to pin down a specific political or policy solution that applies to every case. Instead, we should focus on increased transparency and technical understanding so that users are equipped to engage in the debate and set norms for what practices should and shouldn’t be allowed. Ideally, the privacy industry should be innovating at the same speed as the tracking industry, supporting the development of new tools and technologies to reveal the big picture of privacy harms. This will help researchers gain new insights, bring average users into the conversation, and ultimately prevent bad actors from taking advantage of their position in the shadows.