EU Countries Will Require AI Tools ChatGPT To State Their Data Sources

Makers of artificial-intelligence tools such as ChatGPT would be required to disclose copyright material used in building their systems, according to a new draft of European Union legislation slated to be the West’s first comprehensive set of rules governing the rollout of AI.

Such an obligation would give publishers and content creators a new weapon to seek a share of profits when their works are used as source material for AI-generated content by tools like ChatGPT. The issue has been one of the thorniest commercial questions to emerge amid a frenzy of AI-powered tools being launched or tested by the likes of Microsoft Corp. and Google owner Alphabet Inc. 

European Parliament negotiators plan to insert the new language, reviewed by The Wall Street Journal, in a EU bill that is making its way toward passage, adding measures aimed at regulating a number of aspects of AI. The EU bill is at the forefront of a global push by policy makers to lay down rules for the development and use of AI.

While the drafts of both the bill and the amendments aren’t final, they reflect late-stage agreement among members. The EU states aim to negotiate and pass a final version of the bill later this year.

Europe has been an early mover in attempts to rein in tech companies and write regulation to address the world’s fast-changing technology landscape. Rules made in Brussels have often been adopted worldwide, sometimes by setting legal precedents adopted by other nations. Big tech companies have often adapted their global practices to EU rules so as not to operate too differently across markets.

Interest in AI has exploded since the release last fall of ChatGPT, an open-ended tool developed by OpenAI and backed by Microsoft. It has sparked an arms race with Google and other companies to roll out what are called generative AI tools, which can mimic human creative output to generate their own content.

Under the new provisions being added to the EU’s AI bill, developers of generative AI models will have to publish a “sufficiently detailed summary” of the copyright materials they used as part of their creation, the draft says. The latest generative AI models are trained to create their own content—such as marketing copy, photorealistic images or pop songs—by ingesting billions of existing texts, images, videos or music clips.

The question of how to handle the use of copyright information in the building of generative AI has been controversial since ChatGPT became a viral sensation. The ability of the tools to synthesize huge bodies of creative or intellectual work and then create ostensibly original legal contracts, marketing copy, video clips or computer code has led writers, visual artists and others to complain and demand compensation.

The new provisions could give a boost to publishing executives and other creative workers who have been trying to determine the extent to which their content has been used to train generative AI tools, and how they can be compensated.

Big publishers have called for such guardrails to ensure, among other things, that they get paid for material used by AI. The News Media Alliance, a publishing trade group, is among those asking for such compensation. News Corp, owner of Journal, has also been outspoken on the issue.

At the core of the debate is whether AI companies have the legal right to scrape content off the internet to feed their models. The amendments to the EU’s bill wouldn’t resolve that debate, but would provide information to rights holders and policy makers to inform it.

“This opens the door for right holders,” said Dragoș Tudorache, a Romanian member of the European Parliament who co-leads the body’s work on the AI Act. “Our goal is to increase accountability, transparency and scrutiny of these models.”

U.S. officials are looking at what kinds of curbs are needed for potentially risky AI models. The U.K. has issued a white paper for regulating AI, and China’s top internet regulator earlier this month proposed rules for tools such as ChatGPT.

Last month, Italy temporarily banned ChatGPT on privacy grounds.

The EU’s bill was introduced in 2021, and focused almost exclusively on possible uses of AI tools, saving its toughest rules for what the EU deems to be risky applications of AI, such as banning most police use of facial-recognition technology.

But the explosion of new AI tools since the release of ChatGPT late last year has pushed EU legislators to play catch-up and explore what new, across-the-board rules might be needed for generative AI and other models trained on huge data sets.

Other provisions the EU’s is adding to the bill include requirements that developers of generative AI models, such as the GPT 4 model behind ChatGPT, design them with adequate safeguards against their being coaxed into creating content that violates EU laws, according to the draft text. Such illegal content could include child pornography or, in some EU countries, denial of the Holocaust.

Other provisions would require makers of what are known as foundation models, or AI trained on huge data sets, to demonstrate that they are identifying and mitigating reasonably foreseeable risks, such as to health and safety, or rule of law, the draft shows.

-Article first appeared on the Wall Street Journal

Previous Post Next Post