Asset-Level AI Signaling: DRM 2.0 the IETF Should Avoid Entirely

In recent months, the debate over how AI systems should interact with online content has intensified. Much of the discussion has focused on how creators can express “machine-readable” preferences about AI training and text-and-data mining (TDM). What seems, at first glance, like a debate about metadata quickly becomes something else when one looks at the broader system of copyright, technical enforcement, and academic research. The IETF has a legitimate and important role to play in defining technical vocabularies and interoperability mechanisms. However, what it should not do is help build rights-management infrastructure such as asset-level signaling. The AI-Pref current charter should clearly say asset level AI signaling is not within scope. The danger is not abstract: if the IETF adopts asset-level AI signaling, it will help create a new rights-management system that reshapes the web and overrides user access, restricting openness for everyone.

A recent paper, Contractual Override: How Private Contracts Undermine the Goals of the Copyright Act, illustrates how fair use is overridden through various mechanisms. Dave Hansen, Yuanxiao Xu, and Rachael Samberg chronicle how private licensing terms, Terms of Service, and technical restrictions now routinely override statutory rights such as fair use. They show how activities that Congress deliberately protected, library preservation, accessibility work, and, critically, text-and-data mining, are increasingly constrained not by law, but by contract. This erosion of public rights through private governance is the backdrop against which asset-level AI signaling must be understood.

Although the paper focuses on U.S. copyright law, the underlying problem is not uniquely American. The Internet is a global commons, and research practices especially those that rely on large-scale datasets transcend national borders. When private ordering overrides public rights in one jurisdiction, its effects spill across the entire networked ecosystem. The erosion of fair-use-like exceptions, whether through contract or through technical design, therefore threatens not only U.S. researchers but the global capacity to conduct meaningful inquiry. Seen in this light, asset-level AI signaling is not merely a local policy choice; it is a structural decision with worldwide consequences.

To appreciate just how consequential these technical choices are, consider the centrality of TDM in contemporary research as described in the Contractual Override paper. Scholars now routinely rely on computational tools to sift through large, unstructured datasets to reveal patterns that would otherwise remain hidden. As stated in the paper, TDM has been used to analyze racial disparities in police body-camera footage, to trace the evolution of gender representation in fiction, and to study how violence against women is discussed in digital public spheres. Some of this work can be accomplished using relatively simple algorithms counting the frequency of a word or measuring sentiment based on linguistic proximity. But as research questions become more complex, scholars increasingly rely on machine learning techniques that require training a model on one corpus in order to analyze another. A study identifying how race or gender is represented in film and television, for example, must train a classifier before analyzing thousands of titles. (Refer to Contractual Override for many references for the examples made here)

This kind of inquiry is not optional or ornamental. It is foundational to understanding inequality, representation, and social change. And, as the contractual-override paper explains, it is impossible without access to large and diverse corpora. TDM research often involves millions of works from hundreds of publishers. Many rightsholders cannot be identified; many others are unreachable or unwilling to grant permission. If scholars could only conduct TDM on “safe” materials such as public-domain works or licensed subsets of content, the scope of inquiry would collapse. Research would be restricted to the perspectives of the already-powerful, amplifying existing biases and systematically excluding works by marginalized creators or smaller publishers.

This is precisely why fair use is essential and the EU explicitly denied the right to opt out from Article 3 EU Directive (EU) 2019/790 on Copyright in the Digital Single Market (CDSM Directive) for research purposes. Courts have repeatedly held that copying works for the purpose of TDM is a transformative act. These decisions make clear that analyzing patterns across a corpus does not substitute for the underlying works; rather, it enriches public knowledge. Even the U.S. Copyright Office acknowledges that non-generative AI models have “produced miracles” in science and medicine, and that generative systems can offer similarly broad public benefits when used responsibly.

Asset-level AI signaling threatens this entire structure. Embedding a training permission or prohibition in the metadata of every file creates an environment in which technical enforcement, not the law, decides what can be studied. The problem is not that rightsholders will express preferences; it is that, once encoded at the file level, those preferences will be treated as authoritative by software systems. Crawlers, search engines, generative models, and even institutional research tools will be designed to interpret these signals as enforceable directives. And regulators and courts are likely to follow suit, effectively repealing fair use and Article 3 by way of a technological end run. This has already happened in the EU, where the AI Act’s Code of Practice would require signatories to follow IETF protocols for crawler behavior. What is today a voluntary signal becomes tomorrow’s technical obligation. The result is that entire classes of research, particularly those examining race, gender, inequality, and public discourse, will be significantly restricted or effectively impossible.

This is not speculation. It is what  DRM accomplished over the past two decades. DRM’s primary harm was not that it existed, but that it overrode exceptions. Technical locks made it impossible to exercise rights that copyright law expressly granted. Libraries could not preserve e-books; blind readers could not circumvent restrictions for accessibility; scientists could not analyze proprietary audiovisual materials. Even when fair use clearly permitted the underlying activity, technical measures prevented it. Asset-level AI signaling replays this history, but with far more sweeping implications.

The institutional venue matters. The IETF is a technical standards body. Its mission, according to RFC 3935, is “to make the Internet work better,” and its work has “traditionally focused on the technical aspects of the Internet.” The RFC emphasizes engineering, interoperability, and rough consensus among technically skilled participants. It does not mention copyright. It does not discuss content governance. It does not describe itself as a venue for designing rights-management systems. Embedding copyright preferences into protocols would expand the IETF’s role in ways that are incompatible with its mandate. It would shift the organization from designing interoperable mechanisms to designing legal and governance systems.

Advocates of asset-level signaling sometimes argue that it merely expresses creator intent. But technical signaling at this granularity does not remain a matter of intent. It becomes infrastructure.  And infrastructure is sticky: once standardized, it becomes embedded in software, enforced by platforms, and interpreted by tools that cannot distinguish between lawful and unlawful uses. The harm is not only legal; it is epistemic. It reshapes what researchers can study, what questions they can ask, and whose stories they can tell.

This is why the preferences should be signaled based on location. Location-based signaling  reflects how websites are actually administered, limits the risk of fine-grained enforcement, and preserves the structural flexibility required for fair use. It does not transform every JPEG, PDF, and paragraph into its own legal boundary. It does not fracture the web into millions of micro-regimes with potentially conflicting rules.

At a moment when courts are reaffirming that AI-related copying for TDM can constitute fair use, rightsholders are increasingly turning to private contracts and technical restrictions to achieve outcomes they cannot win through law. This is not an accident. It is a pattern. When the legal system affirms public rights, private actors shift the battlefield to architecture. The IETF should not assist in that shift.

The Internet needs what might be called User Rights Oriented infrastructure—architectural space that enables research, accessibility, innovation, and critical inquiry. Asset-level AI signaling does the opposite. It narrows that space, not by changing the law, but by changing the environment in which the law operates. 

The IETF should not help build DRM 2.0. For the health of scholarship, public knowledge, and the open Internet, asset-level signaling must remain outside the protocol layer. The risks are too great, the costs too invisible, and the consequences too enduring.

ABOUT THE AUTHOR
Farzaneh Badii
Digital Medusa is a boutique advisory providing digital governance research and advocacy services. It is the brainchild of Farzaneh Badi[e]i.Digital Medusa’s mission is to provide objective and alternative digital governance narratives.
Read more

Discover more from Digital Medusa

Subscribe now to keep reading and get access to the full archive.

Continue reading