xAI previews Grok-1.5 and creates a brand new benchmark called RealWorldQA

Elon Musk’s xAI has unveiled Grok-1.5, a multimodal AI model designed to beat the competition in understanding real-world scenarios.

The recent Grok-1.5 follows within the footsteps of others, comparable to GPT-4V, and introduces visual processing to research all the things from documents and charts to graphs, screenshots and photos.

Grok-1.5 It also gains ground in text, coding, and math tasks, scoring 50.6% on the MATH benchmark, 90% on the GSM8K benchmark, and 74.1% on the HumanEval benchmark.

This puts Grok-1.5 right within the LLM heavyweight category and achieves barely lower average values than Gemini Pro 1.5, GPT-4 and Claude 3 Opus.

Grok-1.5’s competitive text, math and coding benchmarks. Source: xAI

Grok-1.5 also offers longer contextual understanding with as much as 128,000 tokens, a 16x increase in comparison with its predecessor, but falls well wanting the degrees touted by Claude 3 Opus and Gemini 1.5 Pro.

The Needle in a Haystack (NIAH) evaluation showed that Grok-1.5 is able to locating embedded text in contexts as much as 128,000 tokens long.

However, it’s Grok-1.5’s vision capabilities that drive xAI essentially the most.

Demos show how Grok-1.5 converts block schemas into Python code, generates bedtime stories inspired by kid’s paintings, creates CSV records from screenshots, and even “extends” memes.

Grok-1.5 tops the leaderboard in some established benchmarks comparable to Mathvista and TextVQA and performs best within the newly established xAI benchmark RealWorldQA.

Grok-1.5’s impressive vision benchmarks. Source: xAI

Under the hood, Grok-1.5 is built on a custom distributed training framework that permits the xAI team to prototype ideas and train recent architectures at scale with minimal effort.

xAI was founded last 12 months and includes among the world’s leading AI researchers with the extremely ambitious goal of “understanding the universe.”

So far we’ve the fun and edgy Grok-1, which tells people the best way to synthesize narcotics and criticizes Musk and Tesla.

Grok can also be connected to the post database

Musk’s xAI project challenges the predominantly closed-source generative AI ecosystem and makes its models generally available under true Open source licenses.

Combined with Meta, which has the same intent to go against the grain of the competition, xAI’s open thesis could develop into a thorn within the side of monetization efforts from OpenAI, Microsoft, Anthropic and Google.

RealWorldQA

In the Grok-1.5 preview, xAI also unveiled RealWorldQA, a brand new benchmark consisting of over 700 images, each accompanied by an issue and a verifiable answer.

The dataset mainly consists of anonymized images captured from vehicles and other real-world situations.

The RealWorldQA dataset is used to judge the spatial understanding capabilities of Grok 1.5 and other multimodal AI models. xAI felt that other benchmarks were missing on this department.

Grok-1.5 outperforms the competition in RealWorldQA and it should be interesting to see if it catches on.

Even if Grok-1.5 is unable to know the universe, it should take its place as one other top model in an ever-expanding product range.

This also shows that generative AI in its current form is reaching the height of its capabilities – although perhaps not for long.

This article was originally published at dailyai.com

xAI previews Grok-1.5 and creates a brand new benchmark called RealWorldQA

RealWorldQA

About The Author

MyAiQ

Leave a reply Cancel reply

Recent Posts

xAI previews Grok-1.5 and creates a brand new benchmark called RealWorldQA

RealWorldQA

About The Author

MyAiQ

Related Posts

Melbourne Now: an unlimited, sprawling and provoking exhibition that seems to burst out of its architectural framework

‘Virtual influencers’ are here, but should Meta really be setting the moral ground rules?

Australians are concerned about AI. Is the federal government doing enough to mitigate risks?

Blade Runner 2049 misses rise of creative artificial intelligence

Leave a reply Cancel reply

Recent Posts