PRISM: Solving for X

prism

Figure 1: PRISM

I thought it would be a fun exercise to describe PRISM  based on information publicly available through the press, private companies, and the DNI. Specifically, how would this system look if we took all the statements made at face value?  This might be a stretch, but it seems like a worthwhile exercise  — not unlike a multivariate equation when one or more of the variables are unknown.

While PRISM is potentially the least troubling with respect to its legality and the type/volume of information of the 4 programs we’ve learned about, it is also the most technically puzzling. There have been many theories on the architecture of PRISM and I’ve been inundated with requests to help press/advocates understand it — so here goes.

Figure 2: PRISM Collection Details

Based on the released powerpoint slides and official public statements, we assume the following:

PRISM could be:

  • An established “legal,” contractual relationship between the named companies and the U.S. government (“legal” in quotes — I’m not touching that one)
  • A process for requesting (through email, an API, or sneakernet), transfering (i.e. SFTP or some other meachism), and ingesting responses from the named companies to the NSA

PRISM probably is NOT:

  • A blanket ingestion of all data flowing to/from the provider (although there are indications that much of the metadata is collected via other programs)
  • Direct access (“root”) to the named companies’ servers or infrastructure.
  • A singular architecture or design for surveillance (Figure 1 outlines the superset of features available, but these ‘vary by provider’)

With all that in mind, here is a guess as to how the process could work:

0. The government first establishes as relationship with the company, negotiating for  access to their user’s data.  Then a process is set up with each company which entails identifying individual points of contacts (with clearance?),  type of data available, and transfer procedures/protocols (i.e SFTP) for each instance (Figure 3 shows the timeline).
1. An NSA agent at his desk makes a request from the company.  His request is potentially based on intel from one of the other programs, including metadata surveillance programs reported in addition to FAIRVIEW/BLARNEY in Figure 3.  For example, metadata intel might reveal suspicious activity from an IP address of a presumed suspect to gmail.  The analyst then makes a request from Google for emails originating from that address.
2. Each request is purportedly vetted by the FBI in order to ensure that it’s not overly inclusive of U.S. citizens’ information.
3.  The request then goes to the named company for processing, either through a traditional legal process or a potentially automated process (similar to the various secure LEA portals)
4. The company determines how to fulfill this request and provides the resultant data to the agency — this is where things get interesting.

Figure 3: NSA INTEGRATION TIMELINE

Each company would digest this request according to their specific agreement with the government.  Google has said that no boxes are maintained onsite, but rather that they upload information to the NSA server.  However, we have some indication there might be ‘deeper’ integration for other companies. It’s conceivable that some providers (say Microsoft/Skype) have installed a box onsite through which the NSA can mirror/divert traffic.  Specifically, an onsite box would allow the government to conduct real-time voice intercepts. (One note about the Google scenario: if the NSA doesn’t want Google to know the target of its investigation, then it’s likely that the initial query either needs to be padded with additional requests, resulting in an overbroad request (i.e all communications between these 500 IP addresses in Pakistan), or that an employee or contractor with security clearance performs the queries)

Figure 4: UPSTREAM COLLECTION

There are still some very important open questions about how this system works, such as

  • How effective is the the 51% test?
  • Are company employes/contractors trusted with the potential target of an NSA investigation or is the data padded/obfuscated in some way?
  • How are the queries monitored for accountability and data minimization?
  • How are things like Skype voice communications intercepted? (Skype chats are easy since they’re stored unencrypted on Skype’s servers. Are calls diverted to supernodes that Microsoft or the NSA control to be recorded? What about historic calls?)
  • How are these systems kept secure and prevent things like what happened with Google’s Aurora?

That’s my analysis.  Thanks to semipr0 for the graphics wizardry and helping me think through this.
Any feedback welcome — contact me here.
Someone translated our PRISM graphic in Iran (though sadly I can’t read/write Farsi).