Meta and UCSD introduce ToolVerifier to enhance LLM tool calls

Researchers at Meta and the University of California San Diego (UCSD) have developed ToolVerifier, a way that improves the way in which LLMs access and interact with software tools.

In order for LLMs to grow to be useful as general assistants or agents, they need to be taught how you can use various tools or APIs. While fine-tuning an LLM to make use of a selected tool works, the actual challenge is getting an LLM to interact with recent tools without the necessity for fine-tuning or few-step demonstrations.

When two tools are very similar, it might probably be particularly difficult for the LLM to decide on the best one to realize their goal. The current practice of providing a couple of examples for every tool may take up a big portion of the context window available to an LLM.

ToolVerifier is a self-verification method that permits the LLM to ask itself questions to search out out which tool to make use of and what parameters to pass to the tool.

To support the LLM, ToolVerifier first selects probably the most suitable tool from an options library after which generates the corresponding parameters. Each of those steps generates questions to guage the choice and differentiate between similar candidate tools.

Here is an example from the research work showing the means of tool selection and parameter clarification.

ToolVerifier first identifies the highest two tools and generates a verification query. The answer to the query results in the ultimate tool selection. The same method is used to generate parameters. Source: arXiv

ToolVerifier was trained on data consisting of an inventory of synthetic tools, including travel, banking, and calendar tools, and their associated descriptions. Training was given to pick out the suitable tool based on the title and outline alone.

After training in tool selection and parameter verification, researchers tested ToolVerifier with 4 tasks from the ToolBench benchmark, which required Llama 2-70B to interact with 17 previously unseen tools.

The results published within the newspaper say that using the ToolVerifier method resulted in “a mean improvement of twenty-two% over low-shot baselines, even in scenarios where the differences between the tools in query are finely nuanced.”

Percentage (%) success rate for Weather, Booking, Home and Cat tasks from the Toolbench benchmark when comparing models with and without ToolVerifier. Source: arXiv

The results show that ToolVerifier significantly improves the tool selection and accurate parameter generation of an LLM. The method has only been trained and tested to interact with a single tool and never to interact with multiple tools, but it surely continues to be promising.

Tool-enhanced LLMs are an exciting development in the usage of AI as a generalized agent. Once LLMs learn to make use of multiple tools to realize a goal, they can be much more useful to us than they already are.

The future where an AI assistant will book a flight, coordinate a gathering, or do your grocery looking for you doesn’t seem distant.