US Congressman Adam Schiff has tabled a bill proposing tech firms training AI models disclose use of any copyrighted data. It follows criticism from artists and creators on unfair use of their work by tech firms raking in billions. Can this stifle innovation? Mint explains.
What does the US bill propose?
Titled Generative AI Copyright Disclosure Act, 2024, the bill proposes that tech firms building AI models disclose the source of their data. So far, companies like OpenAI, Microsoft, Google and Meta have built large AI models, trained on trillions of data points. However, their disclosure has been an issue. The bill asks companies to disclose usage of any copyrighted data while training the AI models to a centralized ‘register’, 30 days prior to the introduction of the respective model or product. A penalty of “not less than $5,000″ has been proposed in the bill—in a move that has been welcomed so far.
So no copyrighted work can be used?
The bill does not prohibit the usage of copyrighted work in training AI models, but instead seeks to compensate original creators for their work being used in large scale, commercial AI products. Creators globally, including The New York Times, have filed lawsuits against the likes of OpenAI, alleging the use of copyrighted work. Experts say the bill is a positive move, and could help establish a uniform commercial model for everyone to follow globally. This could offer further clarity to enterprises on using AI models commercially, which may increase AI adoption going forward.
If it becomes law, will it set a precedent?
The bill will now be voted on at two levels of the US Congress. Subsequent to this, the bill would become law once it receives the US President’s signature. However, even as a bill, policymakers in India believe that the proposals create a strong precedent that can help India shape a regulatory model for AI developments across various domains.
How will it impact AI innovation?
Most stakeholders of the AI ecosystem do not see the bill as stifling innovation. Instead of creating approval bottlenecks for tech companies, experts state that the proposed Act could create a benchmarking process that would disclose how fairly an AI model has been trained. So far, training AI models remains a ‘black box’ —one that offers little definition or disclosure of details. It could also allow creators to be fairly compensated for their work—and let Big Tech train models without worrying about lawsuits.
Are there alternatives to copyrighted data?
Adobe, for its generative AI platform Firefly, has used only data that it owns. While this shows there are alternatives to using copyrighted data for AI training, general purpose AI models require wider datasets. As models get bigger, it would become harder for tech firms to avoid copyrighted data while training their models—which in turn would be used in the future commercial AI products. Copyright issues have kept generative AI’s enterprise adoption at a limited scale so far—which could now open up.