Large language models, similar to those powered by popular artificial intelligence chatbots like ChatGPT, are incredibly complex. Although these models are used as tools in lots of areas, similar to customer support, code generation, and language translation, scientists still don’t fully understand how they work.

To higher understand what is going on on under the hood, researchers at MIT and elsewhere examined the mechanisms at work when these massive machine learning models retrieve stored knowledge.

They found a surprising result: large language models (LLMs) often use a quite simple linear function to get better and decode stored facts. In addition, the model uses the identical decoding function for similar kinds of facts. Linear functions, equations with only two variables and no exponents, capture the direct, straight-line relationship between two variables.

The researchers showed that by identifying linear functions for various facts, they’ll examine the model to see what it knows about recent topics and where throughout the model that knowledge is stored.

Using a way they developed to estimate these easy functions, the researchers found that even when a model answered a matter incorrectly, it often retained the right information. In the long run, scientists could use such an approach to seek out and proper falsehoods throughout the model, which could reduce a model’s tendency to sometimes give incorrect or nonsensical answers.

“Although these models are really complicated, nonlinear functions which might be trained on a variety of data and are very obscure, sometimes really easy mechanisms work in them. This is an example of that,” says Evan Hernandez, an electrical engineering and computer science (EECS) graduate student and co-lead creator of a Paper detailing these results.

Hernandez wrote the paper with co-lead creator Arnab Sharma, a pc science doctoral student at Northeastern University; his advisor Jacob Andreas, associate professor of EECS and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); lead creator David Bau, assistant professor of computer science at Northeastern; and others at MIT, Harvard University and the Israeli Institute of Technology. The research will probably be presented on the International Conference on Learning Representations.

Find facts

Most large language models, also called transformer models, are neural networks. Neural networks are loosely based on the human brain and contain billions of interconnected nodes, or neurons, grouped into many layers that encode and process data.

Much of the knowledge stored in a transformer might be represented as relationships connecting subjects and objects. For example, “Miles Davis plays the trumpet” is a relationship that connects the topic Miles Davis to the article trumpet.

As a Transformer gains more knowledge, it stores additional facts about a selected topic across multiple levels. When a user asks about this topic, the model must decipher probably the most relevant fact to reply the query.

If someone requests a transformer by saying, “Miles Davis plays that. . .” The model should answer “trumpet,” not “Illinois” (the state where Miles Davis was born).

“Somewhere within the network computation there should be a mechanism that appears for the proven fact that Miles Davis is playing the trumpet after which pulls out that information and helps generate the following word. We wanted to grasp what this mechanism was,” says Hernandez.

The researchers conducted a series of experiments examining LLMs and located that, although the models are extremely complex, they decode relational information using a straightforward linear function. Each function is restricted to the style of fact retrieved.

For example, the transformer would use a decoding function each time it desired to output the instrument an individual plays, and a unique function each time it desired to output the state through which an individual was born.

The researchers developed a way for estimating these easy functions after which calculated functions for 47 different relationships, similar to “capital of a rustic” and “lead singer of a band.”

Although there might be an infinite variety of possible relationships, the researchers selected to check this specific subset because they’re representative of the kinds of facts that might be written this manner.

They tested each feature by changing the topic to see if it could restore the right object information. For example, the function for Capital of a Country should retrieve Oslo if the topic is Norway and London if the topic is England.

Functions retrieved the right information greater than 60 percent of the time, showing that some information in a transformer is encoded and retrieved this manner.

“But not every thing is linearly coded. For some facts, we cannot find linear functions for them, despite the fact that the model knows them and predicts text that matches those facts. This suggests that the model is doing something more complicated to store this information,” he says.

Visualizing the knowledge of a model

They also used the features to find out what a model believes to be true on various topics.

In one experiment, they began with the prompt “Bill Bradley was a” and used the decoding functions for “plays sports” and “attended college” to see if the model knows that Senator Bradley was a basketball player who attended Princeton.

“We can show that even when the model chooses to deal with other information when producing text, it still encodes all of that information,” says Hernandez.

They used this probing technique to create what they call an “attribute lens,” a grid that visualizes where specific details about a selected relationship is stored within the transformer’s many layers.

Attribute lenses might be robotically generated and supply a streamlined method for researchers to learn more a couple of model. This visualization tool could allow scientists and engineers to correct stored knowledge and forestall an AI chatbot from providing misinformation.

In the long run, Hernandez and his colleagues want to raised understand what happens when facts aren’t stored linearly. You also need to conduct experiments with larger models and study the precision of linear decoding functions.

“This is exciting work that uncovers a missing a part of our understanding of how large language models retrieve factual knowledge during inference.” Previous work has shown that LLMs create information-rich representations of particular subjects from which specific attributes are extracted during inference. “This work shows that the complex nonlinear computation of LLMs for attribute extraction might be well approximated with a straightforward linear function,” says Mor Geva Pipek, assistant professor on the School of Computer Science at Tel Aviv University, who was not involved on this work.

This research was supported partially by Open Philanthropy, the Israeli Science Foundation, and an Early Career Faculty Fellowship from the Azrieli Foundation.

This article was originally published at news.mit.edu