ChatGPT has taken the world by storm. Within two months of its release it reached 100 million lively users, making it the fastest-growing consumer application ever launched. Users are interested in the tool’s advanced capabilities – and anxious by its potential to cause disruption in various sectors.

A much less discussed implication is the privacy risks ChatGPT poses to each one in all us. Just yesterday, Google unveiled its own conversational AI called Bard, and others will certainly follow. Technology firms working on AI have well and truly entered an arms race.

The problem is it’s fuelled by our personal data.



300 billion words. How many are yours?

ChatGPT is underpinned by a big language model that requires massive amounts of knowledge to operate and improve. The more data the model is trained on, the higher it gets at detecting patterns, anticipating what’s going to come next and generating plausible text.

OpenAI, the corporate behind ChatGPT, fed the tool some 300 billion words systematically scraped from the web: books, articles, web sites and posts – including personal information obtained without consent.

If you’ve ever written a blog post or product review, or commented on an article online, there’s a great probability this information was consumed by ChatGPT.

So why is that a problem?

The data collection used to coach ChatGPT is problematic for several reasons.

First, none of us were asked whether OpenAI could use our data. This is a transparent violation of privacy, especially when data are sensitive and might be used to discover us, our relations, or our location.

Even when data are publicly available their use can breach what we call contextual integrity. This is a fundamental principle in legal discussions of privacy. It requires that individuals’ information will not be revealed outside of the context through which it was originally produced.

Also, OpenAI offers no procedures for people to examine whether the corporate stores their personal information, or to request or not it’s deleted. This is a guaranteed right in accordance with the European General Data Protection Regulation (GDPR) – even though it’s still under debate whether ChatGPT is compliant with GDPR requirements.

This “right to be forgotten” is especially vital in cases where the data is inaccurate or misleading, which appears to be a regular occurrence with ChatGPT.

Moreover, the scraped data ChatGPT was trained on might be proprietary or copyrighted. For instance, after I prompted it, the tool produced the primary few passages from Joseph Heller’s book Catch-22 – a copyrighted text.

ChatGPT doesn’t necessarily consider copyright protection when generating outputs.
Author provided

Finally, OpenAI didn’t pay for the info it scraped from the web. The individuals, website owners and firms that produced it weren’t compensated. This is especially noteworthy considering OpenAI was recently valued at US$29 billion, greater than double its value in 2021.

OpenAI has also just announced ChatGPT Plus, a paid subscription plan that may offer customers ongoing access to the tool, faster response times and priority access to latest features. This plan will contribute to expected revenue of $1 billion by 2024.

None of this is able to have been possible without data – our data – collected and used without our permission.

A flimsy privacy policy

Another privacy risk involves the info provided to ChatGPT in the shape of user prompts. When we ask the tool to reply questions or perform tasks, we may inadvertently hand over sensitive information and put it in the general public domain.

For instance, an attorney may prompt the tool to review a draft divorce agreement, or a programmer may ask it to examine a bit of code. The agreement and code, along with the outputted essays, at the moment are a part of ChatGPT’s database. This means they might be used to further train the tool, and be included in responses to other people’s prompts.

Beyond this, OpenAI gathers a broad scope of other user information. According to the corporate’s privacy policy, it collects users’ IP address, browser type and settings, and data on users’ interactions with the location – including the variety of content users engage with, features they use and actions they take.

It also collects details about users’ browsing activities over time and across web sites. Alarmingly, OpenAI states it could share users’ personal information with unspecified third parties, without informing them, to satisfy their business objectives.



Time to rein it in?

Some experts imagine ChatGPT is a tipping point for AI – a realisation of technological development that may revolutionise the best way we work, learn, write and even think. Its potential advantages notwithstanding, we must remember OpenAI is a non-public, for-profit company whose interests and business imperatives don’t necessarily align with greater societal needs.

The privacy risks that come attached to ChatGPT should sound a warning. And as consumers of a growing variety of AI technologies, we must be extremely careful about what information we share with such tools.


This article was originally published at theconversation.com