AI Video Is Here: The Good, the Bad, the Fingers That Haunt Us

The post AI Video Is Here: The Good, the Bad, the Fingers That Haunt Us appeared first on Consequence.

Welcome to a Consequence Chat, where Consequence staffers debate the biggest stories in pop culture. Today, we look at the Sora, the new text-to-video AI generating software. The transcript below has been edited for clarity and length.


Wren Graves (Features Editor): Today’s subject: Sora. Not to be confused with the fish people from Zelda. Not how Mario feels after missing a jump. Sora, the new text-to-video generative program from OpenAI. It’s not available for public use, but OpenAI has released almost 50 sample clips, ranging from about 10 seconds to a minute. This stuff was at the heart of last year’s film strikes, and we’re starting to understand why it has the potential to change Hollywood. With this technology and others like it, we could be on the cusp of a whole new era.

Sora is a text-to-video generator supported by a large language model. OpenAI is saying that it is designed to be a “world simulator”, but while it doesn’t appear ready for heavy metaverse lifting, it’s already interesting. Users can write prompts like, “Cat wakes up her sleepy human,” plus descriptors — photorealistic, or animated, or black and white — and it will pop out a clip. Doesn’t seem to be a clip you can edit, like OpenAI’s still image generator, DALL-E. But still, a whole clip, and as far as we can tell, it’s going to be much, much faster than humans coding graphics.

We’ll talk about the clips themselves in just a bit. But it was trained on a large data set, and they’re not telling us much about that data. As with ChatGPT and Dall-E, it probably includes some copyrighted work. To discuss with me whether that’s actually a big deal is Liz Shannon Miller. Liz, what are some of the legal and ethical issues with the way the data set for generative AI is collected?

Liz: It just sucks. It sucks, sucks, sucks. It’s like, what? Why? Why?

Wren: Stay tuned for our fair and balanced look at AI.

Preview clip from openai.com/sora. Prompt: “Historical footage of California during the gold rush.”

Liz: To start, “generative” is not a fully descriptive term, because what it’s doing is closer to remixing. It’s taking stuff that came before, and it’s remixing it into something new. Call it “plagiarism software,” perhaps.

I’m not a lawyer, I’m not one hundred percent about the current status of these initiatives legally. But for a sign of how ethically messed up things are at this point, take Andreessen Horowitz, a famous or infamous VC firm in Silicon Valley, which wrote to the US Copyright Office in November, “The bottom line is this: imposing the cost of actual potential copyright liability on the creators of AI models will either kill or significantly hamper their development.”

What they’re saying is, if the creators of AI models have to care about copyright, it’s going to be bad for them. And that’s because they need copyrighted material for the AI models to work. And copyrighted material does not belong to AI as a training tool. It belongs to the creators. That’s why we created this whole system. It’s why it is a fundamental part of how, oh, everything works! That’s really damning for me.

But I do feel like we want to talk about some of the ways in which it’s not the worst thing that’s ever happened to society.

Wren: Okay, then, let’s talk about Sora itself. The good, the bad, and the fingers that will haunt us forever.

Preview clip from openai.com/sora. Prompt begins: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle…”

Liz: The good is that these videos do look good. The one thing I’d be interested in knowing is if you can program Sora to add a flaw, like, a 35-millimeter cigarette burn in the corner. How would it handle that?

Wren: That is the kind of thing that Sora’s sister program, DALL-E, handles well. Aesthetic cues — photorealistic, black and white, tintype — DALL-E has has been really good at, and Sora seems to follow suit.

I would say Sora’s photorealism only works when there is not a lot of movement in the frame, or it’s pretty far away. The stuff that looks the best is the animation: the paper craft, the stop motion aesthetics. And animation tends to minimize the traditional problems with generative AI, like the fingers and teeth.

Sora has acknowledged the problems on the preview site with clips where things went wrong. One prompt was five gray wolf pups frolicking and chasing each other around. And in the clip, the pups morph, spawn, and multiply like an adorable eldritch horror.

In the drone-style establishing shots, it’s really common for people to disappear, or meld into a nearby horse, or just kind of drop out of the screen. And the motions of photorealistic people don’t look realistic. No matter what genre they’re going for, as soon as one of the generated people turn their neck, they’re starring in The Exorcist.

Preview clip from openai.com/sora. Prompt: “Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass.”

Liz: There’s always growing pains with this stuff — Disney was incorporating CGI into its animation as far back as the ’80s, there are some shots in The Great Mouse Detective, for example, that are computer generated. In The Hunchback of Notre Dame, they have crowd shots where they basically copy-and-paste chunks of crowd and scatter them around to expand it and make it look bigger. And it actually looks okay, but if you’re looking for it, you can spot the repetition. These days, though, you really can’t.

Wren: Yeah, I don’t think Sora is going to be replacing live-action crowd shots soon, but it might be an improvement on some VFX in terms of how quickly it can be created. A video game designer once told me that the hardest thing to produce with computer generated imagery is smoke in front of a mirror. You’ve got dynamic obscurity with the smoke, and then reflections, which force the computer to generate a completely separate frame of reference that moves with the viewer.

Sora does not seem to have that issue. It has one prompt for the window of a train traveling through the Tokyo suburbs, photorealistic. You can see outside the window and the reflections of people inside the train at the same time, with varying degrees of opacity. This is hard to film with a camera, because you have to take the camera reflections out later, and it’s too time-consuming to produce this entirely with VFX. Sora seems to be able to do this much faster.

Preview clip from openai.com/sora. Prompt: “Reflections in the window of a train traveling through the Tokyo suburbs.”

So there are problems, some of which will get worked out in future updates, and some of them, like the the moral issues, may never get worked out. But already, in some cases, it seems much, much faster than cracking the whip over 40 computer programmers for 20 hours a day for four weeks straight trying to get the VFX right.

When we’re figuring out the impact of a new technology, I try to ignore punditry and look at the major stakeholders. The people who have money and jobs on the line, what are they doing? Well, Microsoft bought OpenAI, the company behind Sora, in 2019 for $1 billion. That’s a big investment. Then, OpenAI launched ChatGPT, the chat bot based on their large language model, in November of 2022. And it very quickly took off. According to a study by the investment bank UBS, ChatGPT had a hundred million active users two months after launch. That is a record for consumer applications. It is one of the most rapidly adapted technologies in human history.

For Sora specifically, we don’t have a lot of information. But we can infer something about the potential for disruption from the Hollywood strikes from last year, where one of the biggest sticking points was AI. Clearly, those stakeholders are concerned. The people who have the most to gain and the most to lose are acting like AI products could change the world.

Liz: I don’t know if that’s an argument for AI as much as it is something to fight against.

Wren: So let’s talk about some of the people fighting. Last year, the Writers Guild of America launched a strike in May, and the actors in SAG-AFTRA followed in July. Liz, you covered those strikes closely, you interviewed actors on the picket line, you reported on the deals. Um, could you please remind us why they went on strike and what they got out of the dispute?

Preview clip from openai.com/sora. Prompt: “Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.”

Liz: Here’s what the writers initially proposed at the beginning of the strike, when it came to the discussion of artificial intelligence: “Regulate use of artificial intelligence on guild protected projects, which are covered by the Minimum Basic Agreement (MBA). AI can’t rewrite or write literary material, can’t be used as source material, and MBA-covered material can’t be used to train AI.” The other side of the bargaining table, the Alliance of Motion Picture and Television Producers (AMPTP), rejected the proposal and countered by offering “annual meetings to discuss advancements in technology.” So, the “Fuck you” response.

The final contract, when negotiations were all done, ended up having a lot more protections in place for writers. Especially the part where, essentially, AI can’t be a writer on a project produced by a major Hollywood studio, and you can’t just hire a human to rewrite your bad AI script into something workable.

The actors had an even higher stake in this. Prior to last year, actors were being told, “Go into this room, get scanned, we’re not going to tell you what happens to your scan, and we get to use it forever. And that’s just part of your contract.”

With the new SAG-AFTRA deal, they didn’t ban the use of AI from altering performances, perhaps even replacing performances, any of that. What they did bargain for is if your image is going to be used and altered, you have to be consulted. You have to approve, and you get paid. That’s a big deal.

And it happened because the Writer’s Guild and SAG-AFTRA are very conscious of the fact that if they’re not ahead on technology, they could be screwed in a couple of years.

Wren: Or sooner. As we discussed, we’re likely to see this kind of technology used on VFX as soon as it’s publicly available. CGI is expensive, VFX artists have to work insane hours, and this is gonna be a cheap alternative that probably puts some of those artists out of a job.

Preview clip from openai.com/sora. Prompt: “A flock of paper airplanes flutters through a dense jungle, weaving around trees as if they were migrating birds.”

Liz: Where else do you think it might it pop up in the short term?

Wren: I think we’re gonna see Sora in things like corporate training videos, which already can’t get much less human. You could see it in some animated shorts, though it looks hard to edit and you’d probably have to adapt the script to what Sora generates.

More positively, I think we’ll get moments of animated fancy in a live action film for much cheaper than we’ve ever been able to do it before. Something like the Paddington films, or Finding Neverland, could be possible on a Sundance movie budget. People who combine live action and animated will have cheaper tools to do the things that they want to do.

And then I think the most obvious use case is just going to be memes on TikTok and YouTube. Self-referential inside jokes based on self-referential inside jokes that no one who remembers 9/11 has a hope of understanding.

Now granted, we’re culture critics. We’re not forecasters. But I do want to talk about the future and how this kind of technology could develop. In his Garbage Day newsletter, Ryan Broderick used an analogy of farm-to-table food with generative AI. With the farm-to-table movement, there was a hope that low transportation costs and really fresh produce would cut down on price and make quality ingredients more accessible. Kind of the opposite happened. Farm-to-table restaurants are not cheap, and it’s the highly processed, low quality ingredients that are usually the most affordable. And it’s easy to imagine something like this happening with AI.

Let me sketch out one scenario. Some streaming services, like Tubi or Pluto, start dumping 100-episode new shows left and right, animes and soap operas with massive storylines and no writers room or animation costs. Maybe it looks like shit, but the story is there, and you can do 100-episode seasons. And meanwhile, HBO is hand crafting eight-episode seasons, while advertising them made by hand by real humans with practical effects and no CGI.

Preview clip from openai.com/sora. Prompt begins: “A cat waking up its sleeping owner demanding breakfast..”

Maybe those AI-supported streamers are free with ads, and maybe the premium streamers with real humans are charging $30 a month. So, Lizstradamus, do you think a situation like that is feasible, or likely? And how do you think generative video will impact pop culture?

Liz: I mean, right now we live in an era where all of the TV we’re watching is made carefully by hand. We’ve never had more options for both new and archival content to watch, and everyone’s still just watching Suits or Bluey.

Hub Intel did a study in 2023, and one of the things they came up with was the number of viewers who will say, “I’ll only try a new show if I’m confident I’ll like it,” is three times the number of viewers who say “I’ll give any show that looks interesting a try.” I think that’s a factor in the surge towards old library shows like Suits, just by virtue of the fact that the show has seven seasons and it was made in an era when I think there was a lot more faith in the quality of television.

Even if it’s the most amazing story ever told, dumping a hundred episodes of some anime that’s been generated with AI… I don’t know if that will cross the threshold. I think people will just be like, “Oh, I could watch Avatar: The Last Airbender again. That sounds good.”

Wren: I never thought about the downside of having so much good content, but I guess you’re right. If you do watch something bad, the opportunity cost has never been higher. And in response, perhaps a lot of people are just watching things they know are good. If the AI stuff is bad, perhaps that will accelerate that trend.

Liz: Coming to this topic as a creative person, I find it really frustrating. I think the novelty of AI has fueled so much of the buzz around these videos, obfuscating any of the actual reasons why people consume pop culture. It’s deeply frustrating to me that we keep overlooking the whole reason why we watch things and we read things in the first place.

Wren: The human element, you mean?

Liz: Yes, humans. Sometimes we suck. For example, we created AI. But you know, sometimes we do good stuff, too. Stuff worth celebrating.

AI Video Is Here: The Good, the Bad, the Fingers That Haunt Us
Wren Graves and Liz Shannon Miller

Popular Posts

Subscribe to Consequence’s email digest and get the latest breaking news in music, film, and television, tour updates, access to exclusive giveaways, and more straight to your inbox.