Last week, OpenAI shocked the world when it announced Sora, its text-to-video solution. OpenAI gave everyone a taste of the functionality, with numerous examples of the creative output, including the prompts used to generate such results. To say the examples were groundbreaking feels like an understatement.
While it has yet to be released, the capabilities appear to be such a huge step beyond everything seen in the text-to-video space. Text-to-image platforms are now capable of producing realistic images and great image construction, but video has proved elusive. While a lot has improved in the last year, since the nightmare-inducing AI-produced video of Will Smith eating spaghetti, nothing has really gotten close to producing anything cinematic.
AI video players like Runway and Pika have started producing strong video assets, though there are limitations. Videos produced are limited to 3-4 seconds and simply add animation to an image. With some prompting and adjustments, it is possible to make these videos up to 16 seconds in length. More often than not, however, these videos are all one take.
Sora looks to shatter all these expectations. Firstly, Sora can reportedly make videos up to 60 seconds in length. While that is impressive, it pales in comparison to the true showcase of Sora: its ability to create complex camera movements. A prime example of this is the 30 Year Old Space Man video. This highlights the ability for Sora to cut together various videos and scenes to help tell a story.
The ability to create complex camera movements highlights Sora’s ability to understand reasoning and cinema. AI video has, for the most part, been an entertaining exploration into capability, but mostly about subtle animation and limited movement. Sora, by contrast, is the first sign that AI could be capable of producing cinema-quality content.
This raises a huge question about what the future of entertainment will look like. There will still be a huge need for human-made film and television, but now the possibility is coming that people could get AI to produce the content they want to watch. No longer needing to face choice paralysis when looking at all the streaming possibilities, people could simply prompt AI to produce something they feel like watching at that particular moment. Someone has 45 minutes free and is in the mood for a sci-fi fantasy? Simply prompt AI to create a 45-minute sci-fi fantasy and you have your own unique piece of content to watch. Include more details to make it more interesting for yourself, or even include yourself in the story if you’re so inclined.
Time will tell if Sora is capable of creating the kind of content shown so far. But as with the Google Gemini reveal late last year, if Sora is not yet able to create exactly what was shown yet, it is only a matter of time before it does.
The likes of Midjourney, the dominant leader in the realism space for text-to-image, are working on their own video solution. Based on OpenAI’s announcement, however, it could be hard for Midjourney to compete. But if they can get close to Sora, then the world of entertainment could see a huge shakeup in the coming years. While the writers’ strike last year highlighted concern about the use of AI in recreating actors without their consent, Sora hints at a world where we may have completely AI-created actors for cinema. The concept of an AI Influencer is the first step towards a true virtual celebrity.
