Threat Models for Differential Privacy

differential privacy blog series banner

This post is part of a series on differential privacy. Learn more and browse all the posts published to date on the differential privacy blog series page in NIST’s Privacy Engineering Collaboration Space.

It’s not so simple to deploy a practical system that satisfies differential privacy. Our example in the last post was a simple Python program that adds Laplace noise to a function computed over the sensitive data. For this to work in practice, we’d need to collect all of the sensitive data on one server to run our program.

What if that server gets hacked? Differential privacy provides no protection in this case—it only protects the output of our program!

When deploying differentially private systems, it’s important to consider the threat model—that is, what kind of adversaries we want the system to protect against. If the threat model includes adversaries who might compromise the server holding the sensitive data, then we need to modify the system to protect against this kind of attack.

The design of systems for differential privacy thus needs to consider both privacy and security. As described in our last post, privacy refers to controlling what can be inferred from a data release. We can think of security as an orthogonal issue: security mechanisms control who is allowed to access a piece of data, but make no guarantees about what can be learned from that data.

Central Differential Privacy

The most commonly-used threat model in differential privacy research is called the central model of differential privacy (or simply, “central differential privacy”).

The key component of the central model is a trusted data curator. Each individual submits their sensitive data to the data curator, who stores all of the data in a central location (i.e. on a single server). The data curator is trusted: we assume they will not look at the sensitive data directly, will not share it with anyone, and cannot be compromised by any other adversary. In other words, with this model we assume that the server holding the sensitive data cannot be hacked.

In the central model, we typically add noise to query results, as we did in our Laplace mechanism example. The advantage of this model is that it allows algorithms to add the smallest possible amount of noise, and therefore produce results with the maximum accuracy allowed under differential privacy. The figure below demonstrates this process. We place the privacy barrier in between the trusted data curator and the analyst; to the right of the privacy barrier, only differentially private results can be viewed, so the analyst does not need to be trusted.

Privacy Blog:  Figure 1: Central Model of Differential Privacy

Figure 1: Central Model of Differential Privacy

The disadvantage of the central model is that it requires a trusted data curator—and many data curators are not considered trustworthy! In fact, lack of trust in the data collector is often a primary motivation for the use of differential privacy.

Local Differential Privacy

The local model of differential privacy addresses the security issue in the central model by eliminating the trusted data curator. Each individual adds noise to their own data before sending it to the data curator. This means that the data curator never sees the sensitive data, so does not need to be trusted. The figure below demonstrates the local model; here, the privacy barrier stands between each data owner and the (untrusted) data curator.

Privacy Blog:  Figure 2: Local Model of Differential Privacy

Figure 2: Local Model of Differential Privacy

The local model of differential privacy avoids the security issues of the central model—if the data curator’s server is hacked, the hackers only see noisy data that already satisfies differential privacy. This is a major benefit of the local model, and it’s why the local model was adopted in Google’s RAPPOR system [1] and Apple’s data collection system [2].

The drawback? The local model produces less accurate answers than the central model. In the local model, each individual adds enough noise to satisfy differential privacy; thus the total noise for all participants is much larger than the single noise sample used in the central model.

As a result, the local model is only useful for queries with a very strong “signal.” Apple’s system, for example, uses the local model to estimate the popularity of emojis, but the results are only useful for the most popular emojis (i.e. where the “signal” is strongest). The local model is typically not used for more complex queries, like those used in the U.S. Census [3] or applications like machine learning.

Hybrid Models

The central and local models have their advantages and drawbacks, so achieving the best of both is a natural goal and an active area of research.

One alternative approach is the shuffling model, implemented in a system called Prochlo [4]. The shuffling model includes an untrusted data curator and individual data contributors, and adds a set of partially trusted shufflers. In this model, each individual adds a small amount of noise to their own data, and then submits it to the shuffler, which adds additional noise before forwarding batches of data to the data curator. The idea is that shufflers are unlikely to collude with the data curator or each other, so the small amount of noise added by individuals is sufficient to guarantee privacy. Each shuffler operates on a batch of inputs—in the same way as the central model—so a small amount of additional noise guarantees privacy for the whole batch. The shuffling model is a compromise between the local and central models: it allows adding less noise than the local model, but requires more noise than the central model.

Another possibility is to combine differential privacy with techniques from cryptography, such as secure multiparty computation (MPC) or fully homomorphic encryption (FHE). FHE allows computing on encrypted data without decrypting it first, and MPC allows a group of parties to securely compute functions over distributed inputs without revealing the inputs. Computing a differentially private function using secure computation is a promising way to achieve the accuracy of the central model with the security benefits of the local model. In this approach, the use of secure computation eliminates the need for a trusted data curator. Recent work [5] demonstrates the promise of combining MPC and differential privacy, and achieves most of the benefits of both the central and local models. In most cases, secure computation is several orders of magnitude slower than native execution, which is often impractical for large datasets or complex queries. Secure computation is an active area of research, however, and its performance is improving quickly. Stay tuned for a more complete blog post on this topic later in the series.

Coming up Next

In our next post, we’ll look at our first open-source tools for enforcing differential privacy. We’ll explore tools that can be deployed to answer counting queries on real databases, including tools accessible to non-experts and ones that can scale to very large databases, like the ones used by the U.S. Census Bureau to count people with differential privacy!

References

[1] Erlingsson, Úlfar, Vasyl Pihur, and Aleksandra Korolova. “Rappor: Randomized aggregatable privacy-preserving ordinal response.” In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054-1067. 2014.

[2] Apple Inc. “Apple Differential Privacy Technical Overview.” Accessed 7/31/2020.  https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf

[3] Garfinkel, Simson L., John M. Abowd, and Sarah Powazek. “Issues encountered deploying differential privacy.” In Proceedings of the 2018 Workshop on Privacy in the Electronic Society, pp. 133-137. 2018.

[4] Bittau, Andrea, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. “Prochlo: Strong privacy for analytics in the crowd.” In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 441-459. 2017.

[5] Roy Chowdhury, Amrita, Chenghong Wang, Xi He, Ashwin Machanavajjhala, and Somesh Jha. “Cryptε: Crypto-Assisted Differential Privacy on Untrusted Servers.” In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 603-619. 2020.

Leave a Reply

Your email address will not be published. Required fields are marked *