Automattic, the corporate behind WordPress and Tumblr, is discussing a knowledge and content take care of MidJourney and OpenAI.

This information, initially covered by 404 Media and based on information from an unnamed source inside Automattic, indicates that an agreement between Automattic and these AI organizations may very well be close at hand. 

This follows rumors circulating on Tumblr a couple of potential take care of MidJourney that would introduce a brand new revenue stream for the platform.

404 says the deal process has been messy up to now, including a partially failed data transfer to OpenAI and MidJourney that contained, in considered one of Tumblr’s product managers’ words:

“Private posts on public blogs, posts on deleted or suspended blogs, unanswered asks (normally these will not be public until they’re answered), private answers (these only show as much as the receiver and will not be public), posts which are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may occasionally not be an enormous deal, I don’t know).”

The implications of this remain unclear and further details of the deal are forthcoming.

The gold rush for AI training data moves up a notch

And identical to that, the gold rush for AI training data has moved up a gear. 

Yes, generative AI firms have at all times needed vast quantities of information – however the crucial difference is that this isn’t coming totally free. 

Just days ago, Reddit reportedly discussed licensing its vast array of user-generated content to a yet-to-be-revealed AI company, a deal that may very well be value around $60 million annually. This emerges as Reddit gears up for a public offering in March, aiming for a valuation near $5 billion.

This potential licensing agreement aligns with a growing trend amongst tech firms to secure legitimate data use agreements, especially within the face of accelerating copyright risks. Ongoing legal battles, reminiscent of the New York Times lawsuit, have dialed up the urgency for content deals. 

Automattic’s move to barter with AI firms raises questions on using user-generated content for AI training purposes. They’ve allegedly announced plans to introduce a brand new feature that permits users to opt out of getting their data shared with third parties, including AI firms. 

Automattic has lept to back its commitment to working with AI firms that respect community values, including attribution, opt-outs, and control over data.

They made a public statement published following 404’s report, stating, “We currently block, by default, major AI platform crawlers — including ones from the most important tech firms — and update our lists as latest ones launch,” and “will share only public content that’s hosted on WordPress.com and Tumblr from sites that haven’t opted out.” 

It continues, “We are also working directly with select AI firms so long as their plans align with what our community cares about: attribution, opt-outs, and control.”

However, it seems that opting out of getting your information used for AI training might penalize your accounts.

A brand new yet-posted FAQ entitled “What happens while you opt out?” states, “If you opt-out from the beginning, we are going to block crawlers from accessing your content by adding your site to a disallowed list. If you alter your mind later, we also plan to update any partners about individuals who newly opt-out and ask that their content be faraway from past sources and future training.”

We’re now living in a world where anything you’ve posted on the web may very well be sold for AI training purposes – if it’s not taken totally free, that’s. 

As AI evolves, the talk over data use and privacy will likely intensify.

Companies who own data goldmines stand to win big, but at what cost to the common web user?

The post OpenAI and MidJourney wish to buy WordPress and Tumblr data appeared first on DailyAI.

This article was originally published at dailyai.com