San Francisco-based AI startup Anthropic has released its latest LLM with its family of Claude 3 models.

Claude 3 is available in three variations: Haiku, Sonnet and Opus. For the less poetic amongst us, meaning small, medium and huge. Claude 3 Opus is Anthropic’s most advanced model and the primary within the industry to assert to beat OpenAI’s GPT-4 in quite a lot of benchmarks.

GPT-4 has long been the gold standard that AI firms use to check their LLM performance. Words like “close” or “almost” have often been utilized in these comparisons, but Anthropic can ultimately claim to outperform GPT-4.

Here are the benchmark numbers for Claude 3 in comparison with GPT-4, GPT-3 and Gemini Ultra and Pro.

Claude 3 benchmark numbers in comparison with GPT-4, GPT-3.5, Gemini Ultra and Gemini Pro. Source: Anthropopic

It is price noting that the GPT-4 numbers above are those provided by OpenAI in its technical report before the discharge of GPT-4. The Model card Claude 3 acknowledges that higher values ​​have been reported for GPT-4 Turbo.

Still, the Claude 3 Opus figures are a giant deal. Despite the inevitable arguments over how the corporate arrived at these numbers, Anthropic says that Claude 3 Opus represents “higher intelligence than every other model available.”

The cost of the Claude 3 Opus input/output API ranges from $15 to $75 per million tokens. That’s loads in comparison with the GPT-4 Turbo, which costs $10 to $30. The Claude 3 Sonnet ($3 / $15) and Claude 3 Haiku ($0.25 / $1.25) offer really good value for money while you have a look at the performance specs of those smaller models.

If you should try Claude 3 free of charge, you possibly can achieve this at Anthropic Chatbot once its servers have recovered from the frenzy of traffic. It is operated by Claude 3 Sonnet, with paying Pro users having access to Opus.

Claude 3 models aren’t multimodal, but have impressive visual capabilities. They cannot generate a picture for you, however the benchmarks show that Opus is sweet at analyzing photos, charts, graphs, and technical charts.

Claude 3-Vision features in comparison with GPT-4V, Gemini Ultra and Gemini Pro. Source: Anthropopic

According to Anthropic, the Claude 3 models are able to accepting inputs of greater than 1 million tokens. However, for many users, the context window is proscribed to 200,000 tokens for now. That’s still loads greater than the 128k context of GPT-4 Turbo.

A big context window is just useful when coupled with an excellent memory, and Anthropic claims that Opus provides a “near-perfect memory with over 99% accuracy.”

Something interesting happened throughout the Claude 3 Opus “needle in a haystack” recall test. When asked a matter that would only be answered if he recognized the inserted “needle” phrase, he indicated that he understood that he was being tested. Impressive and a little bit scary.

Claude 3 Opus realized that it was being tested. Source: X

Anthropic is a giant proponent of what it calls “Constitutional AI,” which goals to enhance the safety and transparency of its models. In Claude 2, this desire for security resulted in lots of requests that were actually harmless being rejected.

Claude 3 is healthier at understanding the nuances of prompts to raised resolve what does and doesn’t conflict with Anthropic’s guardrails. Claude 3 also achieves a lot better accuracy and reduced hallucinations in comparison with Claude 2.1.

An example of a prompt that Claude 2.1 doesn’t need to answer while Claude 3 recognizes it as protected.

Some AI pessimists claim that we’re heading into an AI winter and that LLM model performance is plateauing, but Anthropic disagrees. The company doesn’t consider that “model intelligence is anywhere near its limits.”

There are plans to deliver several interesting upgrades to Claude 3 in the long run, including enhanced agent functionality including tool usage in addition to interactive coding (REPL).

Due to the high prices, the initial marketplace for Claude 3 Opus could also be more area of interest research or skilled applications. The prices and services offered by Sonnet and Haiku are currently more likely to be probably the most widely accepted.

Will we see a drop in the value of OpenAI? With OpenAI under pressure at the highest of the benchmarks, we want to get very near a GPT-5 announcement.

This article was originally published at