Privacy has been an essentially contested concept [[?PRIVACY-CONTESTED]]. Its debated meaning render its support problematic in the context of a standards-setting process grounded in consensus [[?W3C-PROCESS]], in seeking out technical solutions grounded on shared requirements, and in addressing the needs of a worldwide constituency. This document provides definitions for privacy and related concepts that are suitable for a global audience, that can provide building blocks for privacy threat modelling, and can guide the development of the Web as a trustworthy platform. In the spirit of building a much-needed bridge between technology and policy, this document is written under the expectation that it can apply to both.
Privacy is essential to trust, and trust is a cornerstone value of the Web [[?RFC8890]]. In much of everyday life, people have little difficulty assessing whether a given flow of information constitutes a violation of privacy or not [[?NYT-PRIVACY]]. However, in the digital space, users struggle to understand how their data may flow between contexts and how such flows may affect them, not just immediately but at a much later time and in completely different situations. Some actors then seize upon this confusion in order to extract and exploit [=personal data=] at unprecedented scale.
The goal of this document is to define all the terms that may prove useful in developing technology and policy that relate to privacy and [=personal data=]. It additionally provides a toolbox to support the common need that is privacy threat modelling, the frequent debate over consent, and the under-developed set of issues in privacy that are of a collective, relational nature.
[=Personal data=] is a regulated object, and this document naturally recognises the jurisdictional primacy of existing data protection regimes. However, the global nature of the Web means that, as we develop technology, we benefit from shared concepts that guide the evolution of the Web as a system built for its users [[?RFC8890]]. A clear and well-defined view of privacy on the Web, grounded in an up-to-date understanding of the state of the art, can hopefully help the Web's constituencies thrive across jurisdictional disparity, with the shared understanding that the law is a floor, not a ceiling.
This section provides a number of elementary building blocks from which to establish a shared understanding of privacy. Some of the definitions below build atop the work in Tracking Preference Expression (DNT) [[tracking-dnt]].
A user (also person or data subject) is any natural person.
We define personal data as any information relating to a [=person=] such that:
Data is permanently de-identified when there exists a high level of confidence that no human subject of the data can be identified, directly or indirectly (e.g., via association with an identifier, user agent, or device), by that data alone or in combination with other retained or available information, including as being part of a group. Note that further considerations relating to groups are covered in the Collective Issues in Privacy section.
Data is pseudonymous when:
This can ensure that [=pseudonymous data=] is used in a manner that provides a minimum degree of governance such that technical and procedural means to guarantee the maintenance of pseudonymity are preserved. Note that [=pseudonymity=], on its own, is not sufficient to render [=data processing=] [=appropriate=].
A vulnerable person is a [=person=] who, at least in the [=context=] of the [=processing=] being discussed, are unable to exercise sufficient self-determination for any consent they may provide to be receivable. This includes for example children, employees with respect to their employers, people in some situations of intellectual or psychological impairment, or refugees.
A party is a [=person=], a legal entity, or a set of legal entities that share common owners, common controllers, and a group identity that is readily evident to the [=user=] without them needing to consult additional material, typically through common branding.
The first party is a [=party=] with which the [=user=] intends to interact. Merely hovering over, muting, pausing, or closing a given piece of content does not constitute a [=user=]'s intent to interact with another party, nor does the simple fact of loading a [=party=] embedded in the one with which the user intends to interact. In cases of clear and conspicuous joint branding, there can be multiple [=first parties=]. The [=first party=] is necessarily a [=data controller=] of the data processing that takes places as a consequence of a [=user=] interacting with it.
A third party is any [=party=] other than the [=user=], the [=first party=], or a [=service provider=] acting on behalf of either the [=user=] or the [=first party=].
A service provider or data processor is considered to be the same [=party=] as the entity contracting it to perform the relevant [=processing=] if it:
A data controller is a [=party=] that determines the [=means=] and [=purposes=] of data processing. Any [=party=] that is not a [=service provider=] is a [=data controller=].
The Vegas Rule is a simple implementation of privacy in which "what happens with the [=first party=] stays with the [=first party=]." Put differently, it describes a situation in which the [=first party=] is the only [=data controller=]. Note that, while enforcing the [=Vegas Rule=] provides a rule of thumb describing a necessary baseline for [=appropriate=] [=data processing=], it is not always sufficient to guarantee [=appropriate=] [=processing=] since the [=first party=] can [=process=] data [=inappropriately=].
A [=party=] processes data if it carries out operations on [=personal data=], whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, [=sharing=], dissemination or otherwise making available, [=selling=], alignment or combination, restriction, erasure or destruction.
A [=party=] shares data if it provides it to any other [=party=]. Note that, under this definition, a [=party=] that provides data to its own [=service providers=] is not [=sharing=] it.
A [=party=] sells data when it [=shares=] it in exchange for consideration, monetary or otherwise.
The purpose of a given [=processing=] of data is an anticipated, intended, or planned outcome of this [=processing=] which is achieved or aimed for within a given [=context=]. A [=purpose=], when described, should be specific enough to be actionable by someone familiar with the relevant [=context=] (ie. they could independently determine [=means=] that reasonably correspond to an implementation of the [=purpose=]).
The means are the general method of [=data processing=] through which a given [=purpose=] is implemented, in a given [=context=], considered at a relatively abstract level and not necessarily all the way down to implementation details. Example: the user will have their preferences restored (purpose) by looking up their identifier in a preferences store (means).
A context is a physical or digital environment that a [=person=] interacts with for a purpose of their own (that they typically share with other [=person=] who interact with the same environment).
A [=context=] can be further described through:
A [=context=] carries context-relative informational norms that determine whether a given [=data processing=] is appropriate (if the norms are adhered to) or inappropriate (when the norms are violated). A norm violation can be for instance the exfiltration of [=personal data=] from a context or the lack of respect for [=transmission principles=]. When [=norms=] are respected in a given [=context=], we can say that contextual integrity is maintained; otherwise that it is violated ([[?PRIVACY-IN-CONTEXT]], [[?PRIVACY-AS-CI]]).
We define privacy as a right to [=appropriate=] [=data processing=]. A privacy violation is, correspondingly, [=inappropriate=] [=data processing=] [[?PRIVACY-IN-CONTEXT]].
Note that a [=first party=] can be comprised of multiple [=contexts=] if it is large enough that [=people=] would interact with it for more than one [=purpose=]. [=Sharing=] [=personal data=] across [=contexts=] is, in the overwhelming majority of cases, [=inappropriate=].
Your cute little pup uses Poodle Naps to find comfortable places to snooze, and Poodle Fetch to locate the best sticks. Napping and fetching are different [=contexts=] with different norms, and sharing data between these contexts is a [=privacy violation=] despite the shared ownership of Naps and Fetch by the Poodle conglomerate.
Colloquially, tracking is understood to be any kind of [=inappropriate=] data collection.
Additionally, privacy labour is the practice of having a [=person=] carry out the work of ensuring [=data processing=] of which they are the subject is [=appropriate=], instead of having the [=parties=] be responsible for that work as is more respectable.
The user agent acts as an intermediary between a [=user=] and the web. The [=user agent=] is not a [=context=] in that it is expected to coincide with the [=subject=] and operate exclusively in the [=subject=]'s interest. It is not the [=first party=]. The [=user agent=] serves the [=user=] in a relationship of fiduciary agency: it always puts the [=user=]'s interest first, up to and including, on occasion, protecting the [=user=] from themselves by preventing them from carrying out a harmful decision, or at the very least by speed-bumping it [[?FIDUCIARY-UA]]. For example, the [=user agent=] will make it difficult for the [=user=] to connect to a site the authenticity of which is hard to ascertain, will double-check that the user really intends to expose a sensitive device capability, or will prevent the [=user=] from consenting to permanent monitoring of their behaviour. Its fiduciary duties include [[?TAKING-TRUST-SERIOUSLY]]:
These duties ensure the [=user agent=] will care for the user. It is important to note that there is a subtle difference between care and data paternalism which is that the latter purports to help in part by removing agency ("don't worry about it, so long as your data is with us it's safe, you don't need to know what we do with it, it's all good because we're good people") whereas care aims to support people by enhancing their agency and sovereignty.
Privacy threat models for the Web exist ([[?TRACKING-PREVENTION-POLICY]], [[?ANTI-TRACKING-POLICY]], [[?PRIVACY-THREAT]]). While good within the scope of what they intend to do, they tend to be very specific to cross-context tracking as seen by the browser, which is but one problem. Privacy threat modelling could benefit from having more of a toolbox to reuse in different situations.
This document provides building blocks for the creation of privacy threat models on the Web. Note that privacy threat models have an important difference with security threat models in that all [=parties=] are potential threats, even when they are not rogue. In fact the [=first party=] is typically considered to be the primary threat, the one against which the brunt of mitigating techniques are to be leveraged.
The most important building blocks for a privacy threat model are those that define a [=context=]: [=actors=], [=attributes=], and [=transmission principles=], as well as the [=context-relative informational norms=]. Collectively, these define the expectations that obtain in a given [=context=]. Based on these expectations, it becomes possible to ask questions about the ways in which they could fail and how to prevent. What happens if the [=subject=] is a [=vulnerable person=]? How well does the context fare if expectations of consent are undermined by manipulative techniques ([[?DIGITAL-MARKET-MANIPULATION]], [[?PRIVACY-BEHAVIOR]])? If the [=transmission principles=] are based on sharing data with parties while telling them that they cannot use it, what confidence do we have that they are following the rules?
When assessing privacy threats, it is not necessary to establish harms since breaking the [=context=]'s [=norms=] is [=inappropriate=] in all cases. However, consideration of harms can inform which issues to prioritise. Where individual privacy harms are to be defined, the definitive source is Privacy Harms [[?PRIVACY-HARMS]].
A [=person=]'s identity is the set of characteristics that define them. In computer systems, [=identity=] is typically attached to a means of denotation that makes it easier for an automated system to recognise the user, an identifier of some type.
A [=person=]'s characteristics, and therefore [=identity=], is entirely [=context=]-dependent. Recognising a [=person=] in distinct [=contexts=] can at times be [=appropriate=] but this relies on an understanding of applicable [=norms=] and will likely require compartmentalisation. (For example, if you meet your therapist at a cocktail party, you expect them to have rather different discussion topics with you than they usually would, and possibly even to pretend they do not know you.) As a result, automating the recognition of a [=person=]'s identity across different [=contexts=] is [=inappropriate=]. This is particularly true for [=vulnerable=] people as recognising them in different [=contexts=] may force their vulnerability into the open.
[=Identity=] being by nature fragmented, [=user agents=] must work in support of [=users=] having different identities in different [=context=] with respect to all [=parties=] (including the user agent vendor) and to prevent their recognition through other means, where possible.
A keystone principle of the Web is trust [[RFC8890]]. An important part of trust is to ensure that [=data=] collected for a [=purpose=] that matches a clearly delineated [=user=] feature should not then be used for additional secondary [=purposes=]. Email is often used for login and communication [=purposes=], including essential transactional interactions. Using emails for cross-context recognition [=purposes=] is therefore not only [=inappropriate=] but also threatens key messaging infrastructure. Future iterations of the Web platform should ensure that users can log into sites and receive communication without relying on email at all.
A [=person=]'s autonomy is their ability to make decisions of their own volition, without undue influence from other parties. People have limited intellectual resources and time with which to weigh decisions, and by necessity rely on shortcuts when making decisions. This makes their privacy preferences malleable [[?PRIVACY-BEHAVIOR]] and susceptible to manipulation [[?DIGITAL-MARKET-MANIPULATION]]. A [=person=]'s [=autonomy=] is enhanced by a system or device when that system offers a shortcut that aligns more with what that [=person=] would have decided given arbitrary amounts of time and relatively unfettered intellectual ability; and [=autonomy=] is decreased when a similar shortcut goes against decisions made under ideal conditions.
Affordances and interactions that decrease [=autonomy=] are known as dark patterns. A [=dark pattern=] does not have to be intentional, the deceptive effect is sufficient to define them [[?DARK-PATTERNS]], [[?DARK-PATTERN-DARK]].
Because we are all subject to motivated reasoning, the design of defaults and affordances that may impact [=user=] [=autonomy=] should be the subject of independent scrutiny. Implementers are enjoined to be particularly cautious to avoid slipping into [=data paternalism=].
Given the sheer volume of potential [=data=]-related decisions in today's data economy, complete informational self-determination is impossible. This fact, however, should not be confused with the contention that privacy is dead. Careful design of our technological infrastructure can ensure that [=users=]' [=autonomy=] as pertaining to their own [=data=] is enhanced through [=appropriate=] defaults and choice architectures.
In the 1970s, the Fair Information Practices or FIPs were elaborated in support of individual [=autonomy=] in the face of growing concerns with databases. The [=FIPs=] assume that there is sufficiently little [=data processing=] taking place that any [=person=] will be able to carry out sufficient diligence to enable [=autonomy=] in their decision-making. Since they entirely offload the [=privacy labour=] to [=users=] and assume perfect, unfettered [=autonomy=], the [=FIPs=] do not forbid specific types of [=data processing=] but only place them under different procedural requirements. Such an approach is [=appropriate=] for [=parties=] that are processing data in the 1970s.
One notable issue with procedural approaches to privacy is that they tend to have the same requirements in situations where the [=user=] finds themselves in a significant asymmetry of power with a [=party=] — for instance the [=user=] of an essential service provided by a monopolistic platform — and those where [=user=] and [=parties=] are very much on equal footing, or even where the [=user=] may have greater power, as is the case with small businesses operating in a competitive environment. It further does not consider cases in which one [=party=] may coerce other [=parties=] into facilitating its [=inappropriate=] practices, as is often the case with dominant players in advertising [[?CONSENT-LACKEYS]] or in content aggregation [[?CAT]].
Reference to the [=FIPs=] survives to this day. They are often referenced as transparency and choice, which, in today's digital environment, is often a strong indication that [=inappropriate=] [=processing=] is being described.
Different procedural mechanisms exist to enable [=people=] to control the [=processing=] done to their [=data=]. Mechanisms that increase the number of [=purposes=] for which their [=data=] is being [=processed=] are referred to as opt-in or consent; mechanisms that decrease this number of [=purposes=] are known as opt-out.
When deployed thoughtfully, these mechanisms can enhance [=people=]'s [=autonomy=]. Often, however, they are used as a way to avoid putting in the difficult work of deciding which types of [=processing=] are [=appropriate=] and which are not, offloading [=privacy labour=] to the [=user=].
Privacy regulatory regimes are often anchored at extremes: either they default to allowing only very few strictly essential [=purposes=] such that many [=parties=] will have to resort to [=consent=], habituating [=people=] to ignore legal prompts and incentivising [=dark patterns=], or, conversely, they default to forbidding only very few, particularly egregious [=purposes=], such that [=people=] will have to perform the [=privacy labour=] to [=opt out=] in every [=context=] in order to produce [=appropriate=] [=processing=].
An approach that is more aligned with the expectation that the Web should provide a trustworthy, [=person=]-centric environment is to establish a regime consisting of three privacy tiers:
When an [=opt-out=] mechanism exists, it should preferably be complemented by a global opt-out mechanism. The function of a [=global opt-out=] mechanism is to rectify the automation asymmetry whereby service providers can automate [=data processing=] but [=people=] have to take manual action. A good example of a [=global opt-out=] mechanism is the Global Privacy Control [[?GPC]].
Conceptually, a [=global opt-out=] mechanism is an automaton operating as part of the [=user agent=], which is to say that it is equivalent to a robot that would carry out the [=user=]'s bidding by pressing an [=opt-out=] button with every interaction that the [=user=] has with a site, or more generally conveys an expression of the [=user=]'s rights in a relevant jurisdiction. (For instance, under [[?GDPR]], the [=user=] may be conveying objections to [=processing=] based on legitimate interest or the withdrawal of [=consent=] to specific [=purposes=].) It should be noted that, since a [=global opt-out=] signal is reaffirmed automatically with every [=user=] interaction, it will take precedence in terms of specificity over any manner of blanket [=consent=] that a site may obtain, unless that [=consent=] is directly attached to an interaction (eg. terms specified on a form upon submission).
When designing Web technology, we naturally pay attention to potential impacts on the [=person=] using the Web through their [=user agent=]. In addition to potential individual harms we also pay heed to collective effects that emerge from the accumulation of individual actions as influenced by entities and the structure of technology.
Note that in evaluating impact, we deliberately ignore what implementers or specifiers may have intended and only focus on outcomes. This framing is known as POSIWID, or "the Purpose Of a System Is What It Does".
The collective problem of privacy is known as legibility. [=Legibility=] concerns population-level [=data processing=] that may impact populations or individuals, including in ways that [=people=] could not control even under the optimistic assumptions of the [=FIPs=]. For example, based on population-level analysis, a company may know that site.example is predominantly visited by [=people=] of a given race or gender, and decide not to run its job ads there. Visitors to that page are implicitly having their [=data=] processed in [=inappropriate=] ways, with no way to discover the discrimination or seek relief [[?DEMOCRATIC-DATA]].
What we consider is therefore not just the relation between the people who expose themselves and the entities that invite that disclosure [[?RELATIONAL-TURN]], but also between the people who expose themselves and those who do not but may find themselves recognised as such indirectly anyway. One key understanding here is that such relations may persists even when data is [=permanently de-identified=].
[=Legibility=] practices can be legitimate or illegitimate depending on the [=context=] and on the [=norms=] that apply in that [=context=]. Typically, a [=legibility=] practice may be [=legitimate=] if it is managed through an acceptable process of collective [=governance=]. For example, it is often considered [=legitimate=] for a government, under the control of its citizens, to maintain a database of license plates for the [=purpose=] of enforcing the rules of the road. It would be [=illegitimate=] to observe the same license plates near places of worship to build a database of religious identity.
[=Legibility=] is often used to order information about the world. This can notably create problems of [=reflexivity=] and of [=autonomy=].
Problems of reflexivity occur when the ordering of information about the world used to produce [=legibility=] finds itself changing the way in which the world operates. This can produce self-reinforcing loops that can have deleterious effects both individual and collective [[?SEEING-LIKE-A-STATE]].
Issues of [=autonomy=] occur depending on the manner in which [=legibility=] is implemented. When [=legibility=] is used to order the world following rules set by the [=user=] or following methods subject to public scrutiny and [=governance=] models with strong checks and balances (such as a newspaper's editorial decisions), then it will enhance [=user=] [=autonomy=] and tend to be [=legitimate=]. When it is done in the [=user=]'s stead and without [=governance=], it decreases [=user=] [=autonomy=] and tends to be [=illegitimate=].
Data governance refers to the rules and processes for how [=data=] is [=processed=] in any given [=context=]. How data is governed describes who has power to make decisions over [=data=] and how [[?DATA-FUTURES-GLOSSARY]].
In general, collective issues in [=data=] require collective solutions. The proper goal of [=data governance=] at the standards-setting level is the development of structural controls in [=user agents=] and the provision of institutions that can handle population-level problems in [=data=]. [=Governance=] will often struggle to achieve its goals if it works primarily by increasing individual control over [=data=]. A collective approach reduces the cost of control.
Collecting data at large scales can have significant pro-social outcomes. Problems tend to emerge when entities take part in dual-use collection in which [=data=] is [=processed=] for collective benefit but also for [=self-dealing=] [=purposes=] that may degrade welfare. The [=self-dealing=] [=purposes=] will be justified as bankrolling the pro-social outcomes, which, absent collective oversight, cannot be considered to support claims to [=legitimacy=] for such [=legibility=]. It is vital for standards-setting organisations to establish not just purely technical devices but techno-social systems that can govern data at scale.