Comments for the Privacy and Civil Liberties Oversight Board

I am speaking today at the PCLOB meeting on Sections 215 and 702 of the PATRIOT Act.  My panel begins at 12:30 and you can watch it live here.

I will be commenting on the role of technology in these programs, focused on how the limits of technology suggest that claims that surveillance programs can avoid targeting Americans are probably overstated.


Watching the Watchers: Increased Transparency and Accountability for NSA Surveillance Programs

Technology has completely changed the nature and extent of government surveillance. It has made it significantly easier and cheaper to surveil specific individuals and has made it possible to conduct large-scale data mining to reveal connections among individuals. As a result of this massive expansion in capability, we need to pay particular attention to how these tools are used. PCLOB needs to put new structures in place to increase the transparency and accountability of the government’s surveillance programs, specifically related to the use of technology.

I will discuss five issues that should be addressed in order to ensure effective oversight of the NSA’s programs. First, geography is not easily determined on the Internet because technical markers of location are unreliable. Second, technical systems do not “know” the law and cannot make judgment calls. Third, lawmakers don’t have a technical background and so aren’t in a good position to provide a “check” to the system. Fourth, technology has made surveillance increasingly cheaper and easier to conduct removing what once was a natural barrier to wide-spread surveillance. And finally there are indications that stored data will be shared among intelligence and other law enforcement agencies once it has been collected. The technology used to conduct this surveillance has blurred legal boundaries and reduced the cost of surveillance operations in such a meaningful way that we need a major overhaul to the regulatory structure. Effective oversight is impossible without a deep understanding of the technology involved in these programs and a regulatory structure that demands accountability and transparency.

THE INTERNET IS WITHOUT GEOGRAPHY

Former National Security Agency director Michael Hayden (whose vision for using information technology was instrumental in creating the current constellation of surveillance programs) recently stated: “Let’s keep in mind that in a global telecommunications infrastructure, geography doesn’t mean what it used to be. Things of a place may not be in a place, and things in a place may not be of a place. The Internet actually lacks geography.”

This comment from General Hayden seems to undermine the premise that surveillance can be conducted without collecting data from American citizens. It’s telling that the man whose vision brought us these programs also had the insight that the political boundaries of traditional surveillance do not easily or obviously extend to the modern telecommunications infrastructure, particularly on the Internet. This indicates a kind of conscious intent that is retroactively being masked by the innocent observation the geography doesn’t exist online. The recent revelations on the nature and extent of NSA surveillance programs have been followed with reassurances that these investigations are focused on foreigners, but for many technical reasons this seems unlikely to be true.

First, as General Hayden pointed out, the Internet is essentially borderless. For example, while there are ways to try to infer geography from the metadata collected (such as telephone prefixes and IP addresses), that information is often unreliable. Internet communications traverse a variety of routes that may make it difficult to trace back to its origin. For example, Virtual Private Network (VPN) services allow users to mask their true IP addresses by routing traffic via endpoints, sometimes in other countries, before accessing a site. Even if we believe that the NSA is acting in good faith to avoid surveilling US citizens, if they are relying on these rudimentary indicators of location it is unlikely they are entirely succeeding.

Some of the released documents have indicated that, in order to improve the selection process, the NSA maintains a database of US citizen “identifiers” (such as phone numbers or Internet identifiers like MAC addresses, IP addresses, and email accounts) that help it target foreign citizens. While this provides a technical workaround for the inherent lack of geography online, it also implies the NSA is maintaining a database of US citizens’ information. This is problematic because it means that the “anonymous metadata” the NSA collects is not truly anonymous; it can be linked up to other specific identifiers at any time (such as name, email address, past location).

Additionally, changes to email addresses on the provider’s end can further complicate the problem. For example, Yahoo recently announced that all users who hadn’t visited their account in the last year will lose their email addresses and that new users can then re-register those email addresses. This proposal has generated a few security concerns, but also raises a relevant question of what happens if one of those old email addresses was on the NSA’s list? There is some indication in the Inspector General’s report that selectors (e.g. email addresses) are rarely removed once they are set up–what happens to the user that claims osama_bin_sexy@yahoo.com?

There is no obvious technical solution for the problem that the NSA faces in only surveilling foreign targets and nothing in the public documents has been very reassuring that they are trying very hard to remedy this problem.

TECHNICAL SYSTEMS DON’T KNOW ABOUT LAWS

A computer does not “know” the law or have the ability to make judgment calls. The technical systems conducting the surveillance are simply tasked with selecting content that the NSA finds relevant to an investigation. However, unlike with human operators, a technical system cannot work with vague concepts. Whether or not the public knows the definition of “relevant,” it is quite possibly defined in the computer code somewhere where it is determining what bits of data to pull and collect and which to ignore.

What we do know is that “relevance” seems to be interpreted quite broadly, and it can include metadata about Americans who aren’t necessarily linked to a foreign entity. Contact chaining (linking people to “targets of interest” by affiliation) allows people to be identified as “suspects” by virtue of simply having had some exchange with a person of interest.

This raises an interesting question for peer-to-peer software such as Skype and Spotify. For example, if a member of Al Qaeda and I both listen to “Rockwell – Somebody’s Watching Me” at the same time, my software might automatically connect to his computer in order to download the song. Would that make me a suspect? The “guilty by association” philosophy governing contact chaining could implicate innocent parties when their software automatically communicates with peers all over the world, unbeknownst to the user. It’s unclear if being connected to a target through a program like this could result in an innocent user being flagged for suspicious activity.

Another public reassurance is that these programs are focused on non-content data collection. Setting aside for a moment the incredible amount of information a person can learn by looking at metadata, it would take very little technical effort to modify these existing programs to collect content as well. The same technology that allows extraction of email “to/from” headers could easily also capture email content. There is no fundamental difference in the amount of intervention required to collect additional information on foreign or domestic communication. Furthermore, detecting that your content was captured is difficult because there is no obvious indicator that a user’s communication was intercepted (unlike in analog postal mail surveillance where a seal might indicate tampering). For example, some NSA’s programs collect data “on the wire,” or directly from Internet chokepoints, using what looks much more like a traditional wiretapping device. This is indicated in their PowerPoint slides as “upstream” data collection. While the majority of webmail providers are encrypting our communication using HTTPS (the lock icon in your browser), it turns out that the actual delivery of those emails happens “in the clear” (i.e. without encryption). This makes the content of these messages easy to scoop up directly from the exchange, especially when the system is already equipped to capture the metadata traveling the same channels. In fact, programs like PRISM and PPD20 are necessary only when you can’t get the data directly through the “upstream” data collection efforts.

These computer systems are tasked to collect emails to and from specific targets, but it is likely the capture a large amount of less relevant traffic in the process. This makes sense from a technical perspective; anyone trying to build a system to collect this information is going to err on the side of false positives (over collection) rather than false negatives because you can always delete the data that isn’t appropriate. However, once the information is in hand, there is also no incentive to get rid of it. The technology describes a paradigm where, if you can’t conclusively prove that the information you’ve collected is about a person in the United States, the NSA can store it indefinitely. This backwards burden of proof results in a system where the information is “relevant” until proven otherwise. Given what we know about the NSA’s philosophy on the geography of the Internet, proving conclusively that data belongs to a US citizen is not going to be simple and we can assume that a lot of information is stored indefinitely. The nature of the technology used in these programs calls into question the validity of the NSA’s claims that American data can be avoided. The use of technology also indicates that the vague terms they use to explain these programs to the public may, in fact, be better defined than they let on.

LAWMAKERS DON’T KNOW ABOUT TECHNOLOGY

Much as a computer doesn’t understand the law, the law doesn’t understand technology. Technology specific policy is vulnerable to obsolescence in the event of massive paradigm shifts caused by new technology in the future. Furthermore, those in charge of writing and implementing the regulations governing technology and technological surveillance don’t seem to understand the nature of the technology involved. For example, Congress has repeatedly refused the option to prohibit the bulk collection of records. One might interpret this to mean that lawmakers condone this activity, or that they feel the trade-off is justified. However, there’s a different interpretation: that they don’t have enough information to make these decisions (and judging from the public statements by many policy makers about how they didn’t understand the scale of these programs this seems to hold true). The technical underpinnings of these surveillance programs can be confusing, but also the regulation process lacks a feedback loop to allow lawmakers (FISC or Congress) to contextualize the breadth of these programs. Metrics on these programs (for example, the number of citizens improperly targeted by the 215 program) should be provided to the decision-making bodies.

Additionally, lawmakers may not completely understand the scale of data collection or the powerful insights that can be drawn from querying databases. “Big data” processing creates capabilities in linking and correlation that weren’t possible even 10 years ago. The raw intelligence used is beyond what even most expert technologists understand. On a daily basis, we’re informed of breakthroughs in de-anonymization and behavioral profiling technologies that allow sensitive inferences about individuals based on just a few points of information.

The “checks” in this system consist mostly of getting permission from individuals (FISA judges) who aren’t required to have any technical background. I suspect there are gaps in their knowledge of the technical processes involved that compromise their ability to be good referees.

THE COST OF SURVEILLANCE

What we have learned about the NSA’s capabilities suggests a move toward programmatic, automated surveillance previously unfathomable due to limitations of computing speed, scale, and cost. Technical advances have both reduced the financial barriers to surveillance and increased the NSA’s capacity to conduct large-scale operations. The NSA’s arrangement with just a few key telecom providers enables the collection of phone records for over 300 million Americans without the need to set up individual trap-and-trace registers for each person. The NSA Inspector General report states “one of the most effective means [for carrying out foreign intelligence activities] is to partner with commercial entities to obtain access to information that wouldn’t otherwise be available.”

PRISM provides access to the contents of all emails, voice communications, and documents privately stored by a handful of cloud services such as Gmail, Facebook, AOL, and Skype. As a result, the NSA doesn’t bear the cost of collecting or storing data and they no longer have to directly interact with their targets. Recent documents indicate that the cost of the programs described above totaled roughly $140 million over the four years from 2002 to 2006, just a miniscule portion of the NSA’s approximately $10 billion annual budget.

The extent to which technology has reduced the time and cost necessary to conduct surveillance should play an important role as people consider balance of power and intrusiveness of these programs. These operations are only going to get cheaper and more sophisticated over time, and we need to remember that this is a trend with a firm lower bound–once the cost of surveillance reaches zero we will be left with our outdated laws as the only protection.

MISSION CREEP

The data that has already been collected continues to pose a threat to Americans’ civil rights because the rules governing the use of stored data can change and/or expand. The current standard is that data can be retained until “proven innocent.” That means that unless the government can show that a communication is not relevant, the NSA is permitted to keep it for further analysis. If you think of the NSA as a miner panning for gold in a river, these programs are the sieves he uses to pull out the rocks that have the potential to be valuable. The sieve will capture pieces that are relevant, but also many that aren’t. However, under the current rules, the gold miner retains all nuggets unless he can prove they aren’t gold. Data that is collected “incidentally” can be retained if the collector can’t conclusively prove it IS American communication. This creates a giant cache of data being stored without any evidence that it’s necessary or useful to national security.

Once data is collected and stored there are few limitations on how it is shared among the intelligence community. For example, if information pertaining to criminal activity is encountered during the routine over collection, it is shared with law enforcement (under 702 Section 5). Given that, it doesn’t seem like a stretch to guess that this standard will extend to information collected under the 215 Program as well. In fact, the NSA noted in a memorandum that the General Counsel of the CIA and DOD has informed the NSA that, in the future, other CIA/DOD entities may wish to obtain and analyze this metadata using the same rules.
Retention of this information poses a threat to the civil liberties of those caught up in the fray and the scale of this program will undoubtedly expand. The information collected under NSA surveillance programs has value to other agencies and we can already see it being shared among the intelligence community. This creates a significantly different paradigm than the initial “foreign intelligence collection” apparatus.

CONCLUSION

The NSA’s claims that its technical surveillance programs are focused on foreigners, and that they minimize the amount of data collected, are not well supported by the known technical details of the programs. The inherent lack of geography on the Internet makes it difficult to design a program that can accurately track only foreigners, and the design of the program allows the agency to store data it can’t prove it doesn’t need. This trove of data is increasingly of interest to the other intelligence agencies and the lawmakers in place to serve as a “check” the program don’t have the technical expertise necessary to effectively regulate it.

The Privacy and Civil Liberties Oversight Board needs the resources to do the due-diligence of getting under the hood to see what happens to US citizen’s data. Just from scratching the surface, it is apparent that these programs are highly technical and that the NSA is using domain-specific language (such as identifiers and metadata) that policy-makers don’t completely understand. For this to work, PCLOB will need to hire technologists with clearance to work full-time auditing both the NSAs legal and technical operations. Yet, historically, the resources or political interest for this initiative haven’t manifested. As a result, it’s worrisome that the PCLOB will only have the capacity for a cursory review of these programs based on self-reported statements from the NSA.

In the worst case scenario, PCLOB has the capability to generate a report that essentially gives the NSA “a pass,” not because they have conducted a sound analysis of the effect these programs have on Americans, but rather because of limited technical knowledge and resources.

_
These comments were officially submitted to the Federal Register on July 31.