A brand new generation of artificial intelligence (AI) models can produce “creative” images on-demand based on a text prompt. The likes of Imagen, MidJourney, and DALL-E 2 are starting to alter the best way creative content is made with implications for copyright and mental property.

While the output of those models is commonly striking, it’s hard to know exactly how they produce their results. Last week, researchers within the US made the intriguing claim that the DALL-E 2 model might need invented its own secret language to speak about objects.

By prompting DALL-E 2 to create images containing text captions, then feeding the resulting (gibberish) captions back into the system, the researchers concluded DALL-E 2 thinks means “vegetables”, while refers to “sea creatures that a whale might eat”.

These claims are fascinating, and if true, could have essential security and interpretability implications for this sort of large AI model. So what exactly is happening?

Does DALL-E 2 have a secret language?

DALL-E 2 probably doesn’t have a “secret language”. It could be more accurate to say it has its own vocabulary – but even then we will’t know needless to say.

First of all, at this stage it’s very hard to confirm any claims about DALL-E 2 and other large AI models, because only a handful of researchers and inventive practitioners have access to them. Any images which can be publicly shared (on Twitter for instance) needs to be taken with a pretty big grain of salt, because they’ve been “cherry-picked” by a human from amongst many output images generated by the AI.



Even those with access can only use these models in limited ways. For example, DALL-E 2 users can generate or modify images, but can’t (yet) interact with the AI system more deeply, for example by modifying the behind-the-scenes code. This means “explainable AI” methods for understanding how these systems work can’t be applied, and systematically investigating their behaviour is difficult.

What’s happening then?

One possibility is the “gibberish” phrases are related to words from non-English languages. For instance, , which seems to create images of birds, is comparable to the Latin , which is the binomial name of a family of bird species.

This looks like a plausible explanation. For instance, DALL-E 2 was trained on a really wide selection of information scraped from the web, which included many non-English words.

Similar things have happened before: large natural language AI models have coincidentally learned to jot down computer code without deliberate training.

Is all of it in regards to the tokens?

One point that supports this theory is the incontrovertible fact that AI language models don’t read text the best way you and I do. Instead, they break input text up into “tokens” before processing it.

Different “tokenization” approaches have different results. Treating each word as a token looks like an intuitive approach, but causes trouble when similar tokens have different meanings (like how “match” means various things while you’re playing tennis and while you’re starting a hearth).

On the opposite hand, treating each character as a token produces a smaller variety of possible tokens, but each conveys much less meaningful information.

DALL-E 2 (and other models) use an in-between approach called byte-pair encoding (BPE). Inspecting the BPE representations for a few of the gibberish words suggests this may very well be a crucial consider understanding the “secret language”.

Not the entire picture

The “secret language” could also just be an example of the “garbage in, garbage out” principle. DALL-E 2 can’t say “I don’t know what you’re talking about”, so it would at all times generate some sort of image from the given input text.

Either way, none of those options are complete explanations of what’s happening. For instance, removing individual characters from gibberish words appears to corrupt the generated images in very specific ways. And it seems individual gibberish words don’t necessarily mix to provide coherent compound images (as they’d if there have been really a secret “language” under the covers).

Why this is essential

Beyond mental curiosity, you could be wondering if any of this is definitely essential.

The answer is yes. DALL-E’s “secret language” is an example of an “adversarial attack” against a machine learning system: a solution to break the intended behaviour of the system by intentionally selecting inputs the AI doesn’t handle well.

One reason adversarial attacks are concerning is that they challenge our confidence within the model. If the AI interprets gibberish words in unintended ways, it may additionally interpret meaningful words in unintended ways.

Adversarial attacks also raise security concerns. DALL-E 2 filters input text to forestall users from generating harmful or abusive content, but a “secret language” of gibberish words might allow users to bypass these filters.

Recent research has discovered adversarial “trigger phrases” for some language AI models – short nonsense phrases equivalent to “zoning tapping fiennes” that may reliably trigger the models to spew out racist, harmful or biased content. This research is an element of the continuing effort to understand and control how complex deep learning systems learn from data.

Finally, phenomena like DALL-E 2’s “secret language” raise interpretability concerns. We want these models to behave as a human expects, but seeing structured output in response to gibberish confounds our expectations.

Shining a light-weight on existing concerns

You may recall the hullabaloo in 2017 over some Facebook chat-bots that “invented their very own language”. The present situation is comparable in that the outcomes are concerning – but not within the “Skynet is coming to take over the world” sense.

Instead, DALL-E 2’s “secret language” highlights existing concerns in regards to the robustness, security, and interpretability of deep learning systems.



Until these systems are more widely available – and specifically, until users from a broader set of non-English cultural backgrounds can use them – we won’t give you the option to actually know what is happening.

In the meantime, nonetheless, should you’d wish to try generating a few of your individual AI images you possibly can try a freely available smaller model, DALL-E mini. Just watch out which words you employ to prompt the model (English or gibberish – your call).


This article was originally published at theconversation.com