OpenAI is offering to pay news publishers as little as $1 million if they use their content to train large-scale language models (LLMs).
The company is also reportedly in talks with about a dozen other publishers to avoid lawsuits over copyright infringement.
This comes after several complaint reports cited media organizations and artists accusing AI companies of copyright infringement. The allegations are that AI companies are using public archives of news articles to train their LLMs without publishers' knowledge.
quantity is too small
a silicon angle The report notes that while this amount may seem too low given the rise of the company's LLM model, ChatGPT, it all goes back to the nature of the agreement made between the parties.
According to the amount information, too few even for small news publishers. As a result, this could hinder OpenAI efforts.
Last December, OpenAI was reported to have signed a deal with Axel Springer, the German publisher behind media brands such as Politico and Business Insider.
Details of the deal are still sketchy, but the total value is believed to be in the tens of millions of dollars, according to executives cited by the Information.
Also read: Microsoft adds Copilot AI button to new PC keyboards
More AI companies will follow
Other AI companies are also reportedly trying to negotiate with news publishers to use their articles for LLM training.
apple, For example, the company is trying to catch up with OpenAI and Google in generative AI, but it's also trying to strike deals with news publishers, according to one executive quoted by The Information.
The company also reportedly wants the rights to use their content “more broadly” than its competitors and is offering more money to news publishers than OpenAI.
Sources close to the development say Apple wants to make extensive use of the content in “future AI products in any way the company deems necessary.”
The company already has deals worth $50 million with news publishers including NBC News, Vogue, The New Yorker, The Daily Beast and Better Homes and Gardens.
There are no free meals for AI
LLM is pre-trained huge amount of data. But that data doesn't seem to be free. Everything has a price tag, including the data used to train an LLM. Recently, media organizations including the New York Times, Reuters, CNN, and Vox Media, the parent company of Vogue, blocked access to data from OpenAI and Microsoft.
Last December, OpenAI and Microsoft were sued by the Times, accusing the two tech giants of using copyrighted content to train their models.
That's not all. Reddit Inc. tracked all the companies that use their content to train his LLMs. Popular authors have also joined forces to file a lawsuit against an AI company that used their books for LLM training.
According to Silicon Angle, “LLM training is very expensive.”
Beyond data costs
The cost of LLM training goes beyond data availability.according to forbes“Thousands of graphics processing units (GPUs) are required to provide the parallel processing power needed to process the large datasets that these models train on.”
GPUs alone cost millions of dollars.Forbes is Technology overview We analyzed OpenAI's GPT-3 language model and estimated that each training run requires at least $5 million worth of GPUs. More training runs will be required, further increasing costs.