The year AI video goes mainstream?
Thoughts on Veo 2, Google's great advantage, & data pipelines
Most signs indicate 2025 will be a strange year. Among other reasons, I expect it will be the year AI video becomes mainstream. The year AI video goes from a novelty to something we see frequently.
So far AI filmmaking has been a sideshow, happening at the weird kids’ table. To the extent that AI video reached a larger audience, it’s been as a novelty. ‘Look what the AI guy made!’ or ‘AI can do that?’
That was 2024.
In 2025 AI video use cases will hit the mainstream— AI commercials, AI news, AI vloggers, etc. You’ll see shorter, formulaic formats created pretty effectively with generative AI and it won’t even be that weird to watch. Will we see compelling AI films though? That’s a wonderful question. Practical media use cases are easier for a number of reasons. Narrative film, decidedly harder. I’ll be monitoring them all with alacrity this year.
Today, I’ll be writing about just a few things:
Google Veo 2 is very good
Which video model company has the most money and GPUs?
Are you training AI models in a data capture pipeline?
An AI film I like
A common line through these, I realize now, is Google looks very well-positioned to have a breakout year in AI video.
The Google Model - Veo 2
I’m not one to declare “everything just changed!” after the release of every new video model. I’m on record about this. But everything did just inch forward noticeably.
Google’s new video model Veo 2 is very impressive. In particular, the people it generates are very impressive.
Veo 2 feels decidedly different from other video models. Like it’s pulling from a different part of the video genome. In fact, it probably is (more on that later). The human figures look more realistic, remain more consistent, and move with more natural movement. Their faces even seem more emotive. Veo 2’s people are just better than outputs from other video models.
The above example did not use a reference images or a complex workflow. It was creating only with text. Below is another example of the subtle touch Veo 2 has with human expressions and movements from director Verena Puhm. And notice this is a group of people, not just one.
The training set behind Veo 2 isn’t public, but it’s clear Google has done something different. I suspect that difference is Youtube.
It’s a big deal that Veo 2 can generate such realistic people moving in such realistic ways with only a text prompt. I’m writing an essay about how text descriptions are too narrow a window of control for AI models. And I think they are. You can’t get what you want, how you want. But the Veo 2 model is proof you can still get great outputs with just words.
Now there are some obvious shortcomings in Veo 2. After all, it is still a gated research product. The interface is limited. And you can’t bring in outside images to use as reference images. I’ve heard from Google employees they are working on both. The second issue seems like a deliberate deepfake-control measure (Google is very cautious about these things).
This is all to say, Google may catch up quickly in the video model race. When we talk about AI video models, we mention Runway, Kling, OpenAI. We don’t think about Google. But we probably should. Google has more money than most nations. And they have the world’s largest video database in Youtube. In terms of available resources, they have more than any other company working on this technology.
Speaking of, what is the current state of the video model arms race?
Video model leader board
The Model Wars is the exhausting, never-ending contest for the internet’s love as ‘best AI model’. My opinion is that AI models will all reach a similar quality and there will be no best model. AI video generation will become a commodity like cameras, which means the best stuff will be made by the best people, not by the best technology. When all the technology becomes equally good and equally available, the difference will necessarily be the human inputs. David Fincher and Michael Bay both use RED cameras, but to decidedly different results.
Who Will Win the Model Wars is still a fun game to play. Especially in the start of a new year. So let’s play it.
I recently watched a video from Minh Do looking at the resources of each AI video model company. It got me thinking. There are a few things to consider in this game: Money, Data, and GPUs. Money is funding, Data is the training sets, and GPUs (graphics processing units) are the very expensive hardware required to train and run high quality AI models. The more you have of each, the better your AI models. Supposedly.
So let’s take a progress report of each AI company’s position.
AI video model startups
First we have the AI startups focused solely on video models. These companies have made very impressive technology, but as we’ll see they are mice among men.
Pika Labs
Pika Labs makes the Pika 2.0 video model
Pika raised $80 million recently. They’ve been shifting their focus away from ‘best model’ and towards meme effects, like turning everything into cake.
Luma Labs
Luma Labs makes the Luma Dream Machine video model.
Luma raised $90 million in December 2024. They’ve been batting well above their weight compared to the others. It’s an excellent model that many filmmakers prefer.
RunwayML
Runway makes the Runway Gen-3 video model which likes to crash my browser. They also have a suite of many specific AI products for film production.
Runway raised $150 million in 2024. And are potentially raising another $450 million. They are focused on being ‘the cinematic video model’, the one for serious filmmakers.
Chinese AI companies
Then we have the Chinese companies, which are decidedly bigger, better capitalized, and are increasingly taking American market share.
MiniMax
Minimax makes the Hailuo video model. In addition to video models, Minimax makes a lot of other non-video AI products like language models.
Minimax recently raised $600 million, largely from Chinese tech giant Alibaba.
Kuaishou
Kauishou makes the Kling video model. Kuaishou is not a pure AI company. They’re also Chinese social video platform. You could think of them as SnapChat or TikTok.
Kauishou is a publicly traded company that’s worth about $22 billion and has around $7 billion in cash. It’s not clear how much they will divert into their Kling video model. Anecdotally, it’s a very popular video model among AI filmmakers.
Industrial AI companies
Then we have the American AI titans. These are companies with enormous amounts of money and compute in the form of GPUs.
OpenAI
OpenAI makes the Sora video model. They also make several other AI products like ChatGPT.
OpenAI recently raised $6.6 billion in October. They likely have near-unlimited access to more money. It’s unclear how much of OpenAI’s resources will go into Sora which seems to play second fiddle to all their language models.
OpenAI borrows Microsoft’s GPUs which number around 150,000. From the available data, that’s the second most in the world.
Meta
Meta makes the currently unreleased Movie Gen video model.
Meta is a $1.6 trillion-dollar company and has about $70 billion in cash. Video models don’t seem to be a priority for Meta. They haven’t released their video model for public use or signaled a plan to do so.
Meta also has more Nvidia GPUs than any other company on the planet, numbering around 350,000.
Google
Google makes the Veo 2 model, as well as many other AI products.
Depending on the day, Google is a $2.4 trillion-dollar company with about $110 billion in cash. They own ~50,000 Nvidia GPUs but they also manufacture their own special AI chips called TPUs. So really, they have the most compute of anyone.
Training data
We also need to look at training data. Most of the companies listed need to buy, borrow, license, or steal all their datasets. For example, a company like OpenAI or Runway has to find third-party data to train their models on.
Google, Meta, and Kuaishou don’t. Google owns the largest video library in the world, Youtube. Meta owns two very large video libraries called Instagram and Facebook. And Kauishou is a video platform. It’s unconfirmed, but quite likely these companies are using their data to train their video models.
In short, Google and Meta have the most money, the best datasets, and the most compute. If those things matter materially, they’ll likely make the best video models.
Do you know where your data is?
Speaking of data, do you know who has your data? Probably more people than you think.
And do you know what data you produce that’s valuable? Again, probably more than you would think.
Most companies want to train AI models on your data. Sometimes you can opt-out. Sometimes you can’t. A lot of this practice is hidden. We can return to Google for an example of how large companies can use their scale to train AI models. The famous Google reCaptcha is an informative example.
For a while now, we humans have humiliated ourselves with the ritual of proving to computers that we’re human. Like a patronized kindergartener, we identified which box had a dog in it. Or a bus. Or the color red. But then it got harder, and less patronizing. And now captchas are plain annoying. Difficult even.
This entire captcha process is actually an AI training pipeline. For example, Google’s ReCaptcha system uses the responses as datasets to train AI models. As the models become smarter, the captchas become harder. From Google’s website, “Every time our CAPTCHAs are solved, that human effort helps digitize text, annotate images, and build machine learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems.”
I discuss this process more in an upcoming piece I’m writing called Sending Sora to Film School.
An AI film I like
In my last post I was going to link to a 10-minute AI film by the filmmaker Kavan the Kid. But the night before I published it, the video was suddenly taken down from every website (YouTube, X, Reddit, LinkedIn, Instagram) for copyright infringement. I learned it was a swift, orchestrated movement from Warner Brothers. It was, after all, a 10-minute Batman fan film.
Fair enough. So I included a different film in the post.
That ban felt significant though. The video was so good it actually could have been confused for a real piece of Batman content. That’s the message I took away from the content takedown. I’ve been following Kavan the Kid since. He released a new film that is only 90 seconds long. It falls in the category of ‘very impressive visuals’. I don’t think it has a particularly strong story like the Batman piece, but it will be of interest to anyone who wants to see the current cutting-edge of quality.
Also of interest, it was done with… Veo 2!
An AI film message board
One more small item.
As much as I like new technology, I am an old school internet user. I love forums, message board, blogs. In that spirit, I am helping a Los Angeles AI group start a message board for AI filmmaking. It’s a public place for people to post questions, tips, job postings, etc. In my mind, it’s a place for people interested in AI filmmaking to meet online.
For anyone interested, check out the AI filmmaking forums.
…appreciate the thorough overview dude…
Will AI replace voice actors? I heard that audio books will soon be entirely narrated by AI.