OpenAI introduces Sora, a complicated text-to-video model

OpenAI has unveiled Sora, a state-of-the-art text-to-video (TTV) model that generates realistic videos of as much as 60 seconds from a user text prompt.

We’ve seen big advancements in AI video generation these days. Last month we were excited when Google gave us a demo of Lumiere, its TTV model that generates 5-second video clips with excellent coherence and movement.

Just just a few weeks later and already the impressive demo videos generated by Sora make Google’s Lumiere look quite quaint.

Sora generates high-fidelity video that may include multiple scenes with simulated camera panning while adhering closely to complex prompts. It can even generate images, extend videos from side to side, and generate a video using a picture as a prompt.

Some of Sora’s impressive performance lies in things we take without any consideration when watching a video but are difficult for AI to provide.

Here’s an example of a video Sora generated from the prompt: “A movie trailer featuring the adventures of the 30 12 months old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colours.”

This short clip demonstrates just a few key features of Sora that make it truly special.

The prompt was pretty complex and the generated video closely adhered to it.
Sora maintains character coherence. Even when the character disappears from a frame and reappears, the character’s appearance stays consistent.
Sora retains image permanence. An object in a scene is retained in later frames while panning or during scene changes.
The generated video reveals an accurate understanding of physics and changes to the environment. The lighting, shadows, and footprints within the salt pan are great examples of this.

Sora doesn’t just understand what the words within the prompt mean, it understands how those objects interact with one another within the physical world.

Here’s one other great example of the impressive video Sora can generate.

The prompt for this video was: “A trendy woman walks down a Tokyo street full of warm glowing neon and animated city signage. She wears a black leather jacket, a protracted red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, making a mirror effect of the colourful lights. Many pedestrians walk about.”

A step closer to AGI

We could also be blown away by the videos, however it is that this understanding of the physical world that OpenAI is especially excited by.

In the Sora blog post, the corporate said “Sora serves as a foundation for models that may understand and simulate the actual world, a capability we consider will likely be a crucial milestone for achieving AGI.”

Several researchers consider that embodied AI is obligatory to attain artificial general intelligence (AGI). Embedding AI in a robot that may sense and explore a physical environment is one solution to achieve this but that comes with a variety of practical challenges.

Sora was trained on an enormous amount of video and image data which OpenAI says is liable for the emergent capabilities that the model displays in simulating facets of individuals, animals, and environments from the physical world.

OpenAI says that Sora wasn’t explicitly trained on the physics of 3D objects but that the emergent abilities are “purely phenomena of scale”.

This implies that Sora could eventually be used to accurately simulate a digital world that an AI could interact with without the necessity for it to be embodied in a physical device like a robot.

In a more simplistic way, that is what the Chinese researchers try to attain with their AI robot toddler called Tong Tong.

For now, we’ll must be satisfied with the demo videos OpenAI provided. Sora is just being made available to red teamers and a few visual artists, designers, and filmmakers to get feedback and check the alignment of the model.

Once Sora is released publicly, might we see SAG-AFTRA movie industry staff dust off their picket signs?

This article was originally published at dailyai.com

OpenAI introduces Sora, a complicated text-to-video model

A step closer to AGI

About The Author

MyAiQ

Leave a reply Cancel reply

Recent Posts

OpenAI introduces Sora, a complicated text-to-video model

A step closer to AGI

About The Author

MyAiQ

Related Posts

Stability AI Launches Next-Generation AI Music Generator: Introducing Stable Audio 2.0

Vector vs. Relational Databases: A Comprehensive Comparison for Developers

Meet Taipy: An Open-Source Python Library for Easy End-to-End Application Development

GPUs on the Forefront: Driving Innovation in Artificial Intelligence and Machine Learning 💻🚀

Leave a reply Cancel reply

Recent Posts