No One Should Control the Internet After AI: Freedom to Build Cleopatra GPT

As a result of the genuine technical problems that some AI bots have created for networks, we increasingly hear questions about who “controls” the Internet after AI, a question Paul Keller addresses in this blog, or who should decide how online content may be used, particularly for LLMs and other AI products.

The question of “who controls the Internet after AI” reflects a fundamental disagreement on what the Internet is meant to be. The Internet is not something to be controlled; it is to resist centralized authority. It is supposed to be decentralized, end-to-end, and permissionless. While it is true that parts of today’s Internet are becoming increasingly consolidated, responding to this trend by replacing open infrastructure with permission-based access regimes does not solve the problem, it deepens it. When control shifts from technical limits to institutional discretion, agency is constrained by gatekeepers. Rather than countering consolidation, such approaches risk entrenching it by making participation in data reuse contingent on organizational power, legal capacity, and geopolitical position.

Most of the proposed solutions try to respond to one underlying dynamic: large-scale data collection for AI development. When you ask an AI chatbot about Cleopatra, the Egyptian empress, its response partly depends on data gathered from across the web by automated crawlers and scrapers. This practice raises a number of issues, but two stand out in particular: intellectual property rights and strain on network infrastructure.

One thing that I need to clarify from the outset is that the problem is not differentiated access as such, but what is being differentiated. If differentiation is used to offer a genuinely new service, such as higher reliability, structured delivery, real-time feeds, or operational guarantees, then it constitutes an infrastructure product. Wikimedia Enterprise is an example of this model: it provides an additional, enterprise-grade access layer while preserving open dumps and public interfaces for anyone who wishes to use them (even in bulk). But when differentiation is used to gate access to public content itself, without providing a new technical layer or capacity for bulk access, it ceases to be access management and becomes a form of permission and control. 

The problem arises when differentiation is used not to expand capacity, but to gate access to public content itself, without offering any new technical layer or service. In those cases, access management becomes a form of permission and control.

Our search for solutions in this space has created many debates. One such solution is pay-per-crawl which I have already criticized in this blog: it threatens the open web and permissionless innovation by creating a tollbooth that primarily benefits large intermediaries and AI companies, monetizes content scraping, and formalizes surveillance. Most importantly, not everyone has access to payment mechanisms to participate in such systems.

Another proposal comes from Open Future. In his article “Abundance vs. Scarcity: Who Controls the Internet After AI?” Paul Keller frames the challenge as a choice between two visions of control. In a companion report on cultural heritage data, Open Future proposes “conditional” bulk access as a pragmatic solution. The report states:

“Bulk access is different. While the matrix indicates that bulk access could, in principle, be offered under either controlled or conditional terms, conditional access provides a materially better fit with institutional sustainability and stewardship responsibilities grounded in the value of mutuality.”

See the matrix taken from the report. 

This blog  examines a fundamental problem with Keller’s suggestion: differentiated access that Open Future is proposing is not access anymore, it’s permission rebranded. I don’t know if this solution is as problematic as pay-per-crawl or other solutions. But this pragmatic approach unfortunately creates more gatekeepers for the Internet and AI, not more freedom to build. 

The Cleopatra GPT Example

Marwa is an open-source developer in Egypt. She wants to build Cleopatra GPT: a multilingual educational model trained on digitized museum catalogues, papyri metadata, and scholarly descriptions of ancient Egypt. Much of this material is held by European cultural institutions and advertised as “European open cultural heritage data.”

Marwa does not overload servers. She uses automated scripts with carefully set rate limits, mirrors, and cached datasets. She does what technically competent developers can do today: large-scale, asynchronous data collection.

Under Open Future’s differentiated access model, however, Marwa is classified as a “bulk user” not because she overwhelms the network and causes infrastructure harms but because her work involves automated, large-scale AI reuse. The category of  “bulk / collection-level access” as defined in the paper—namely is access via “data dumps/snapshots, batched exports, or object-store access optimized for high-throughput reuse (useful for AI training and other research workflows.” Marwa’s access is most probably going to be conditional:  “Conditional: Access is granted to users or for uses that meet published criteria (e.g., accredited research organizations, public-interest projects). Other types of uses— including large-scale or commercial uses—may require case-by-case approval and may be made conditional on specific terms, including payment of usage fees and reporting obligations.)” 

Since Marwa does large-scale data collection, she might be steered away from public interfaces, asked to use dedicated bulk channels, and potentially required to register, justify her use, or accept contractual terms designed for institutional actors.

Under this model, a developer building an AI tool about Egyptian history must negotiate with European institutions that hold digitized collections. And not because of any copyright claim or innovative products, but purely because those institutions control the technical infrastructure of access. Aside from historical irony is obvious, but the governance problem is universal: permission systems based on institutional custody create arbitrary barriers regardless of who’s trying to access what.

Three Problems with “Conditional Access Model”

Open Future argues that unrestricted access risks turning cultural commons into “a one-way input for private model development.” Their solution: conditional access for bulk use. But this creates three fundamental problems.

Problem 1: Who Gets Caught in the “Bulk Use” Net?

The framework treats “bulk access” as a proxy for “corporate extraction,” but this is structurally flawed.

Marwa does large-scale AI development. She could be doing it for commercial or non-commercial reasons, but she is not a big tech company. Should Marwa go through authentication, contract negotiation, and processes that mainly only large corporations can handle?

Many legitimate users need computational-scale access. These include academic researchers building training datasets, educators, and organizations and open source developers in the global majority. Open Future claims that their “conditional access [framework] allows institutions to differentiate between non-commercial and large-scale commercial uses” (p.10). However, distinguishing  commercial or non-commercial uses can be very fuzzy. Many efforts  can get classified as “bulk users” and steered toward permission systems, even though they cause no infrastructure harm and serve public-interest purposes.

This problem is not new. Creative Commons encountered precisely this ambiguity with its “NonCommercial” (NC) licenses: despite extensive guidance, the boundary between commercial and non-commercial use proved too fuzzy to apply consistently, generating legal uncertainty and chilling legitimate reuse. The same structural flaw reappears in Open Future’s model. By tying access conditions to intent rather than demonstrable technical impact, the framework risks reproducing the very governance failures that licensing regimes exposed. 

Problem 2: Institutional Capacity and Geopolitical Barriers

The report promises that “economic returns help cover the costs of digitization and data preparation, subsidising free use for research and other non-commercial purposes.” Yet this presumes that all researchers can, in practice, enter and sustain the required institutional negotiations. What if Marwa’s research institution cannot carry out such negotiations? What if the European institution declines to engage because it is unclear whether her institution—by virtue of its location, is subject to sanctions or export-control restrictions? Comparable dynamics are well documented in sanctions compliance, export controls, cloud services, and cross-border research governance, where risk-averse institutions routinely deny lawful access not because it is prohibited, but because assessing compliance is too costly or uncertain. What if her institution also lacks the legal resources to review, negotiate, and sign bespoke access agreements?

The framework assumes everyone operates with equivalent institutional capacity. This assumption systematically disadvantages small institutions without legal departments and anyone in countries facing sanctions or regulatory barriers.

Permission systems don’t just add friction, they create structural exclusion for those who lack institutional backing.

Problem 3: The Surveillance Infrastructure Required

Open Future admits that “This may require technical means to limit large-scale access via individual-item pages and/or strong incentives for bulk users to do ‘the right thing’.” But  

“technical means to limit large-scale access” could mean: track users, ask them to authenticate themselves (and in some extreme cases even identify themselves) to distinguish between bulk and individual access, in a way not too dissimilar to the privacy-intrusive tracking of website visitors that many big tech companies engage in lead to surveillance of researchers and developers, and normalization of monitoring and control as prerequisites for accessing public domain materials.

Why This Matters

Gated access, a model designed to address corporate extraction, ends up governing civic and permissionless innovation instead. It turns openness into a permission system for anyone who wants to operate at computational scale. When you must register, authenticate, accept institutional terms, submit to monitoring, then you are not accessing materials. You are requesting access to materials that someone else controls. The language of “differentiated access” obscures what’s actually happening: the construction of permission systems, and access control mechanisms for public domain materials.

Rather than tying large-scale access to institutional status, intent, or discretionary approval, access governance should be grounded in demonstrable technical impact. Controls should respond to measurable load on infrastructure, such as bandwidth, concurrency, or compute usage, rather than to who the user is or how the data might be used. Transparent, resource-based thresholds, automated rate tiers, and cost-recovery pricing linked to operational costs can protect systems without transforming access into a licensing regime. By separating infrastructure stewardship from user classification, such a model preserves openness while avoiding the structural exclusions, compliance chilling, and institutional gatekeeping that permission-based frameworks inevitably produce.

Permissionless innovation was never a bug. It was what made the Internet generative, what allowed unanticipated uses, what enabled people without institutional backing to build things that changed the world. The question is not “who should control the Internet after AI.” The question is: why are we accepting the premise that the Internet needs controllers? No one should control the Internet after AI. We should be dismantling gatekeepers, not building new ones.

Discover more from Digital Medusa

Subscribe now to keep reading and get access to the full archive.

Continue reading