
A new artificial intelligence startup founded by the creators of The world’s most widely used computer vision library has come out of stealth with technology that produces up to five minutes of realistic, human-centered video. It significantly exceeds the capabilities of its rivals, including OpenAI. Sora and Google’s veo.
craft storylaunched on Tuesday with $2 million in funding and is introducing Model 2.0, a video generation system that addresses one of the most significant limitations plaguing the nascent AI video industry: playback time. On the other hand, OpenAI sora 2 The best is 25 seconds, and while most competing models produce clips of 10 seconds or less, CraftStory’s system can produce continuous, consistent video performances that run as long as a typical YouTube tutorial or product demonstration.
This breakthrough could create significant commercial value for companies struggling to scale video production for training, marketing, and customer education. In the market, AI-generated short clips, despite their visual sophistication, are proving to be insufficient.
"When you actually try to create a video using one of these video generation systems, you often find that you want to implement a particular creative vision. No matter how detailed the instructions are, the system will essentially ignore some parts of the instructions." Victor Erukhimov, founder and CEO of CraftStory, said in an exclusive interview with VentureBeat: "We’ve developed a system that basically allows us to generate as many videos as we need."
How to solve long video problems with parallel processing
CraftStory’s progress relies on what the company calls a parallel spread architecture. This is a fundamentally different approach to how AI models generate videos compared to the sequential method adopted by most competitors.
Traditional video generation models work by running diffusion algorithms on increasingly larger three-dimensional volumes, where time represents the third axis. To generate longer videos, these models require proportionally larger networks, more training data, and significantly more computational resources.
craft story Instead, we run multiple small spreading algorithms simultaneously over the entire duration of the video and connect them with bidirectional constraints. "The second half of the video can also affect the first half of the video." Erkhimov explained. "This is very important. If you do this one at a time, artifacts that appear in the first part will propagate and accumulate in the second part."
Rather than generating 8 seconds and stitching in additional segments, CraftStory’s system processes all 5 minutes simultaneously through an interconnected diffusion process.
Importantly, CraftStory trained its model on its own footage, rather than relying solely on videos collected from the internet. The company hired a studio to film the actors using a high frame rate camera system that captures crisp detail even with fast-moving elements like fingers, avoiding the motion blur inherent in standard 30 frames per second YouTube clips.
"What we’ve shown is that you don’t need a lot of data or a training budget to create high-quality videos." Erkhimov said. "All you need is high quality data."
Model 2.0 currently operates as a video-to-video system. Users upload still images to be animated, "driving video" It includes a person whose movements the AI reproduces. CraftStory offers preset driving videos shot by professional actors and may receive a revenue share when your motion data is used. Users can also upload their own footage.
The system produces a 30-second clip at low resolution in about 15 minutes. An advanced lip-sync system synchronizes your mouth movements to your script or audio track, and gesture adjustment algorithms ensure your body language matches the rhythm of your speech and emotional tone.
Bet 2 million dollars to wage war for billions of dollars.
CraftStory is nearly entirely funded. Andrew Fillevhe sold his project management software company Wrike to Citrix. $2.25 billion Operational in 2021 zencoderan AI coding company. This modest funding stands in stark contrast to the billions of dollars flowing into competing efforts – OpenAI raised over $6 billion Just the latest funding round.
Mr. Erkhimov rejected the idea that large capital is a prerequisite for success. "I don’t necessarily subscribe to the thesis that computing is the path to success, but" he said. "Computing definitely helps. But at the end of the day, raising $1 billion on PowerPoint doesn’t make anyone happy, neither the founders nor the investors."
Filev defended the David vs. Goliath approach. "When you invest in a startup, you are essentially betting on people." he said in an interview with VentureBeat. "In the words of Margaret Mead, never underestimate what a small group of thoughtful and dedicated engineers and scientists can build."
He argued that CraftStory benefits from a focused strategy. "Major laboratories are competing to build general-purpose video infrastructure models." Filyov said. "CraftStory is riding that wave and diving deep into a specific format: long-form, engaging, human-centered video."
Why computer vision expertise is important for AI video generation
Erukhimov’s credibility stems from his deep roots in computer vision, rather than the Transformer architecture that has dominated recent advances in AI. he was an early contributor OpenCV — An open source computer vision library that has become the de facto standard for computer vision applications. 84,000 stars on GitHub.
When Intel cut support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the express goal of maintaining and growing the library. The company significantly expanded OpenCV and focused on automotive safety systems before Intel acquired it in 2016.
Filev said that this background is precisely why Erukhimov is in an advantageous position in video generation. "What people sometimes overlook is that generative AI video isn’t just the generative part. It’s about understanding movement, facial dynamics, temporal coherence, and how humans actually move." Filyov said. "Victor has spent his career overcoming those very problems."
Targeted for enterprise training videos and product demos
While much of the public excitement around AI video generation has focused on consumer-facing creative tools, CraftStory is clearly pursuing an enterprise-focused strategy.
"We definitely think about B2B more than consumers." Erkhimov said. "We want to enable companies, especially software companies, to create and present cool training videos and product videos."
The logic is simple. Corporate training, product tutorials, and customer education videos often take several minutes and require consistent quality throughout. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain complex product features.
"If you want longer format videos, you should go with us." Erkhimov said. "Create consistent, high-quality videos up to 5 minutes long."
Mr. Fillev agreed with this assessment. "One of the major gaps in this market is the lack of models that can produce consistent video over longer sequences. This is very important for practical use." he said. "If you’re creating a commercial for your company, a 10-second video isn’t enough, no matter how good it looks. It takes 30 seconds. It takes 2 minutes. We need more."
The company expects to reduce costs for its customers. Filev suggested: "Small business owners can now create content in minutes that previously cost $20,000 and took two months to produce."
CraftStory is also courting creative agencies that produce video content for enterprise clients, with a value proposition centered on cost and speed. Rather than managing expensive multi-day shoots, agencies can record actors on camera and turn that footage into finished AI videos.
The next major development on CraftStory’s roadmap is a text-to-video model that allows users to generate long-form content directly from scripts. The team is also developing support for camera movement scenarios, including some of the most popular scenarios. "talk while walking" A common format for luxury advertising.
Where CraftStory fits into a fragmented competitive landscape
CraftStory enters a crowded and rapidly evolving market. OpenAI sora 2Although it has not yet been released to the public, it is generating a lot of buzz. Google Veo model It’s progressing rapidly. runway, pikaand Stability AI All offer video generation tools with different features.
Erkhimov acknowledged that there are competitive pressures, but stressed that CraftStory serves a clear niche market focused on human-centered video. He identified rapid innovation and market capture as the company’s key strategy, rather than relying on technological moats.
Filev sees the market fragmenting into different segments, with big technology companies playing the following roles: "Powerful generic generative model API provider" Specialist players like CraftStory, on the other hand, focus on specific use cases. "If major companies are building engines, CraftStory is building production studios and assembly lines on top of them." he said.
Model 2.0 is available now at app.craftstory.com/model-2.0 and the company is offering early access to users and businesses interested in testing the technology. While it remains unclear whether cash-strapped startups can gain meaningful market share against deep-pocketed incumbents, Erkhimov is characteristically confident about the opportunities ahead.
"AI-generated video will soon become the primary way companies tell their stories." he said.
