OpenAI’s deals with publishers could cause trouble for competitors

The legal battle between OpenAI and the New York Times over data to coach its AI models could still be ongoing. But OpenAI is making progress on deals with other publishers, including a few of France and Spain’s biggest news publishers.

OpenAI on Wednesday announced that it has signed deals with Le Monde and Prisa Media to bring French and Spanish news content to OpenAI’s ChatGPT chatbot. In a blog post, OpenAI said the partnership will bring the organizations’ current events coverage – from brands like El País, Cinco Días, As and El Huffpost – to ChatGPT users where it is sensible, in addition to contributing to OpenAIs will achieve in any respect -Increasing the quantity of coaching data.

OpenAI writes:

In the approaching months, ChatGPT users will have the opportunity to interact with relevant news content from these publishers through select summaries with attribution and expanded links to the unique articles, giving users the chance to access additional information or related articles from their news sites… What we’re doing We’re improving ChatGPT constantly and support the news industry’s necessary role in providing users with reliable, real-time information.

OpenAI has announced licensing agreements with a handful of content providers right now. Now it gave the impression of a great opportunity to take stock:

Stock library Shutterstock (for images, videos and music training data)
The Associated Press
Axel Springer (owner of Politico and Business Insider, amongst others)
The world
Medium rush

How much does OpenAI pay every time? Well, you don’t say that – at the very least not publicly. But we are able to estimate.

The information reported in January that OpenAI was offering publishers between $1 million and $5 million per yr for access to archives to coach its GenAI models. That doesn’t tell us much in regards to the Shutterstock partnership. But with regards to article licensing — assuming The Information’s reporting is accurate and people numbers have not modified since then — OpenAI spends between $4 million and $20 million a yr on news.

That could possibly be just pennies for OpenAI, whose war chest tops $11 billion and whose annual revenue recently topped $2 billion (Per Financial Times). But as Hunter Walk, partner at Homebrew and co-founder of Screendoor, recently mused, it’s substantial enough to potentially edge out AI competitors who’re also pursuing licensing deals.

Go writes on his blog:

(When experimentation is restricted by licensing deals price nine figures, we’re doing innovation a disservice… Cutting checks for training data “owners” creates an enormous barrier to entry for challengers. If Google, OpenAI and other big tech firms have sufficiently high costs can achieve, they implicitly prevent future competition.

It is questionable whether there’s a barrier to entry today. Many – if not most – AI vendors have chosen to incur the wrath of IP owners by selecting to not license the info on which they train AI models. There are indications that that is, for instance, the art-generating platform Midjourney Training on Disney film stills – and Midjourney has no cope with Disney.

The harder query to grapple with is: Should licensing simply be the associated fee of doing business and experimenting within the AI space?

Walk would argue against this. He advocates for a regulator-imposed “protected harbor” that protects every AI provider – in addition to small startups and researchers – from legal liability so long as they adhere to certain transparency and ethical standards.

Interestingly, recently the United Kingdom tried to codify something along these lines and exempt the usage of text and data mining for AI training from copyright considerations so long as it’s for research purposes. But these efforts ultimately failed.

I’m undecided I’d go that far in his “protected harbor” proposal, given the impact AI threatens to have on an already destabilized news industry. A current model from The Atlantic found If a search engine like Google integrated AI into search, it could answer a user’s query 75% of the time without requiring a click on the web site.

But perhaps there’s room for spin-offs.

Publishers needs to be paid – fairly. However, is not there an final result where they receives a commission and challengers to AI incumbents – in addition to academics – get access to the identical data providers? That’s what I should think. Grants are one option. Another option is larger VC checks.

I can not say I even have the answer, especially on condition that the courts haven’t yet decided whether – and to what extent – fair use protects AI providers from copyright claims. But it is vital that we figure this stuff out. Otherwise, the industry could well find itself in a situation where the educational brain drain continues unabated and only a number of powerful firms have access to large pools of invaluable training offerings.

This article was originally published at techcrunch.com