為何印度要求 AI 公司為訓練資料付費的計畫應推向全球

Hacker News·3 個月前

印度近期提出的草案要求 AI 公司為使用其版權資料進行模型訓練支付權利金，此舉可能成為全球典範，既能補償創作者，也能協助科技巨頭避免漫長的法律訴訟。

Ideas

Why India’s plan to make AI companies pay for training data should go global

A license fee for the use of copyrighted data can compensate creators and help AI companies avoid lengthy legal fights.

By Javaid Iqbal Sofi

Javaid Iqbal Sofi is an artificial intelligence researcher and policy analyst at Virginia Tech.

Arguments, opinions and essays from a global perspective.

India recently released a draft proposal requiring artificial intelligence companies to pay royalties when they use copyrighted work from the country to train their models. If enacted, the law could reshape how Meta, Google, OpenAI, and other big tech firms operate in one of their biggest markets.

With the world’s largest population, India has leverage that few other countries have. It is the second-biggest market for OpenAI’s ChatGPT after the U.S. It is one of the fastest-growing markets for Perplexity’s AI search engine, and the largest user base for WhatsApp and Facebook, where Meta is rolling out its AI tools. Microsoft, Google, and Amazon recently announced some $67 billion in AI infrastructure investments in the country.

India is therefore justified in demanding payment for its copyrighted data. Tech companies “will have to fit those payments into their deployment models — or give up this massive, lucrative market, and all of the scale advantages that being part of it confers,” James Grimmelmann, a professor of digital and information law at Cornell University, told Rest of World.

India’s linguistic diversity is another reason why AI companies need to treat the country differently, Grimmelmann said. The government is keen to develop multilingual large language models that can cater to the specific needs of businesses and individuals, which means companies need local data that belongs to local creators.

With tech companies having already made massive financial commitments in India, they cannot afford to walk away.

India isn’t the only country thinking about a fee. Brazil’s new AI bill also has a provision that mandates compensation for copyright holders when their data is used for training. The bill is awaiting a final vote.

As AI models are adopted more widely, dozens of cases have been filed against tech firms in the U.S. and elsewhere for using copyrighted material — including journalism, literature, music, photography, and film — without the consent of creators. In India, the ANI news agency has sued OpenAI for copyright violations, while writers in Singapore have pushed back against a government proposal to let AI companies train on their work without compensation.

Tech companies have generally put forward the argument of “fair use,” which permits use of copyrighted material, without consent, for purposes such as teaching or research. It is the model that the U.S. favors, even as Anthropic agreed to pay $1.5 billion to a group of authors to settle a copyright infringement lawsuit. Europe’s opt-out system places the burden on creators to police companies: track use, send notices, and hope for compliance.

Both the U.S.’ fair-use model and European opt-outs rely on companies voluntarily disclosing the data they use. Yet companies are increasingly opaque about their training data, according to an index that tracks the transparency of foundation models.

“This is robust evidence that market incentives are insufficient to increase transparency for most companies, including on training data,” Rishi Bommasani, a senior researcher at the Stanford Institute for Human-Centered AI and one of the authors of the report, told Rest of World.

India’s hybrid framework proposes companies pay a mandatory blanket license fee — a percentage of their global revenue — for using copyrighted materials to train their AI models. It also recommends the establishment of an agency to collect the license fee and distribute it to registered creators.

Tech companies “will have to fit those payments into their deployment models — or give up this massive, lucrative market.”

The proposal has had pushback even within India. Nasscom, the tech industry lobbying body, formally dissented, saying mandatory licensing would slow innovation, and that India should adopt the U.S. approach of allowing training on lawfully accessed content. It is also likely “to do more harm than good to the small creators it is supposed to protect,” Rahul Matthan, a partner and head of the technology practice at law firm Trilegal, wrote on his website.

The proposed payment model is “deeply flawed,” wrote Matthan, a former adviser to the government. Big, established artists are likely to receive a disproportionately large share of the license fee, while small creators “would have to settle for a pittance.” The proposal prohibits opt-outs, so small creators cannot withhold their works from being used for AI training, he noted.

Rather than focus on the training data, it would be more effective to focus on the outputs that the models generate, Matthan wrote. “If it can be shown that an AI system, in response to a prompt, has reproduced a substantial portion of a copyrighted work, that would be clear evidence of copyright infringement … and entitle the author to appropriate legal remedies.”

But litigation is expensive, and can drag on for years. Remember the lawsuit filed by the Authors Guild against Google? The company had scanned more than 20 million books without permission, and this went on for more than a decade. Mandatory licensing provides certainty: Companies know what they owe, and creators know they will be paid. AI companies also wish to avoid lengthy legal fights, and are inking licensing deals with major publications and creators. India’s preemptive approach creates a framework before the lawsuits pile up.

With tech companies having already made massive financial commitments in India, they cannot afford to walk away. Once they adjust their business models to accommodate the payment framework, extending the practice to smaller countries can become routine. Countries with valuable training data but less market power can simply adapt India’s framework — much like they did with the European Union’s GDPR privacy law.

To be sure, mandatory licensing doesn’t solve every problem. Figuring out how much each individual work contributed to a model’s output is difficult, Grimmelmann said. Implementation also has real challenges. “It requires government administrative capacity — it’s actually a bigger involvement by the state than litigation would be,” he said.

Yet for all its flaws, this is a proactive — and feasible — solution to the question of fair compensation for creative work. If India stands up to AI firms — as it once did in refusing Facebook’s Free Basics scheme — other countries may well follow suit. The outcome in New Delhi and Brasília will determine whether smaller countries without their scale adopt mandatory licensing, or get stuck choosing between costly litigation and opt-outs that have already failed elsewhere.

BYD burns profit chasing global dominance over Tesla

The Chinese EV giant reported a 33% drop in third-quarter profit while ramping up overseas expansion and R&D spending.

AI’s green-energy goal is devastating Taiwan’s coastal villages

Aggressive expansion of wind energy to power the semiconductor industry is upending the livelihoods of farmers and fishers.

Silicon, not oil: Why the U.S. needs the Gulf for AI

Qatar and the UAE are set to sign the Pax Silica initiative that would add Gulf wealth to Washington’s AI supply chain push.

— Hacker News