AI與企業對知識的壟斷

Hacker News·

本文探討了當前人工智慧領域與亞倫·史瓦茲(Aaron Swartz)案件所引發的問題相似之處,強調了企業為發展AI而挪用版權材料的現象,以及知識可及性方面的持續爭議。

Schneier on Security

Search

Powered by DuckDuckGo

Subscribe

Image

Image

Image

Image

HomeBlog

AI and the Corporate Capture of Knowledge

More than a decade after Aaron Swartz’s death, the United States is still living inside the contradiction that destroyed him.

Swartz believed that knowledge, especially publicly funded knowledge, should be freely accessible. Acting on that, he downloaded thousands of academic articles from the JSTOR archive with the intention of making them publicly available. For this, the federal government charged him with a felony and threatened decades in prison. After two years of prosecutorial pressure, Swartz died by suicide on Jan. 11, 2013.

The still-unresolved questions raised by his case have resurfaced in today’s debates over artificial intelligence, copyright and the ultimate control of knowledge.

At the time of Swartz’s prosecution, vast amounts of research were funded by taxpayers, conducted at public institutions and intended to advance public understanding. But access to that research was, and still is, locked behind expensive paywalls. People are unable to read work they helped fund without paying private journals and research websites.

Swartz considered this hoarding of knowledge to be neither accidental nor inevitable. It was the result of legal, economic and political choices. His actions challenged those choices directly. And for that, the government treated him as a criminal.

Today’s AI arms race involves a far more expansive, profit-driven form of information appropriation. The tech giants ingest vast amounts of copyrighted material: books, journalism, academic papers, art, music and personal writing. This data is scraped at industrial scale, often without consent, compensation or transparency, and then used to train large AI models.

AI companies then sell their proprietary systems, built on public and private knowledge, back to the people who funded it. But this time, the government’s response has been markedly different. There are no criminal prosecutions, no threats of decades-long prison sentences. Lawsuits proceed slowly, enforcement remains uncertain and policymakers signal caution, given AI’s perceived economic and strategic importance. Copyright infringement is reframed as an unfortunate but necessary step toward “innovation.”

Recent developments underscore this imbalance. In 2025, Anthropic reached a settlement with publishers over allegations that its AI systems were trained on copyrighted books without authorization. The agreement reportedly valued infringement at roughly $3,000 per book across an estimated 500,000 works, coming at a cost of over $1.5 billion. Plagiarism disputes between artists and accused infringers routinely settle for hundreds of thousands, or even millions, of dollars when prominent works are involved. Scholars estimate Anthropic avoided over $1 trillion in liability costs. For well-capitalized AI firms, such settlements are likely being factored as a predictable cost of doing business.

As AI becomes a larger part of America’s economy, one can see the writing on the wall. Judges will twist themselves into knots to justify an innovative technology premised on literally stealing the works of artists, poets, musicians, all of academia and the internet, and vast expanses of literature. But if Swartz’s actions were criminal, it is worth asking: What standard are we now applying to AI companies?

The question is not simply whether copyright law applies to AI. It is why the law appears to operate so differently depending on who is doing the extracting and for what purpose.

The stakes extend beyond copyright law or past injustices. They concern who controls the infrastructure of knowledge going forward and what that control means for democratic participation, accountability and public trust.

Systems trained on vast bodies of publicly funded research are increasingly becoming the primary way people learn about science, law, medicine and public policy. As search, synthesis and explanation are mediated through AI models, control over training data and infrastructure translates into control over what questions can be asked, what answers are surfaced, and whose expertise is treated as authoritative. If public knowledge is absorbed into proprietary systems that the public cannot inspect, audit or meaningfully challenge, then access to information is no longer governed by democratic norms but by corporate priorities.

Like the early internet, AI is often described as a democratizing force. But also like the internet, AI’s current trajectory suggests something closer to consolidation. Control over data, models and computational infrastructure is concentrated in the hands of a small number of powerful tech companies. They will decide who gets access to knowledge, under what conditions and at what price.

Swartz’s fight was not simply about access, but about whether knowledge should be governed by openness or corporate capture, and who that knowledge is ultimately for. He understood that access to knowledge is a prerequisite for democracy. A society cannot meaningfully debate policy, science or justice if information is locked away behind paywalls or controlled by proprietary algorithms. If we allow AI companies to profit from mass appropriation while claiming immunity, we are choosing a future in which access to knowledge is governed by corporate power rather than democratic values.

How we treat knowledge—who may access it, who may profit from it and who is punished for sharing it—has become a test of our democratic commitments. We should be honest about what those choices say about us.

This essay was written with J. B. Branch, and originally appeared in the San Francisco Chronicle.

Tags: Aaron Swartz, AI, copyright, LLM

Posted on January 16, 2026 at 9:44 AM •
0 Comments

Subscribe to comments on this entry

Image

Leave a comment Cancel reply

Blog moderation policy

Name

Email

URL:

Remember personal info?

Fill in the blank: the name of this blog is Schneier on ___________ (required):

Image

Allowed HTML
<a href="URL"><em> <cite> <i><strong> <b><sub> <sup><ul> <ol> <li><blockquote> <pre>
Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Δdocument.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() );

Sidebar photo of Bruce Schneier by Joe MacInnis.

Powered by WordPress Hosted by Pressable

About Bruce Schneier

Image

I am a public-interest technologist, working at the intersection of security, technology, and people. I've been writing about security issues on my blog since 2004, and in my monthly newsletter since 1998. I'm a fellow and lecturer at Harvard's Kennedy School, a board member of EFF, and the Chief of Security Architecture at Inrupt, Inc. This personal website expresses the opinions of none of those organizations.

Related Entries

Featured Essays

More Essays

Blog Archives

More Tags

Latest Book

Image

More Books

Image

Image

Hacker News

相關文章

  1. AI時代下的機構崩潰四階段

    3 個月前

  2. 加速探索宣言:AI作為學術研究中的認知環境

    3 個月前

  3. 內容與社群

    stratechery · 9 個月前

  4. 權威性是AI的瓶頸

    4 個月前

  5. 您簽署了AI隱私政策。您同意了什麼?

    3 個月前