This webpage demonstrates our long narrative video generation capabilities through three main sections:
Showcasing our core method that uses interleaved visual embeddings to generate coherent cooking videos with strong temporal consistency.
Demonstrating an alternative approach using FLUX.1-Schnell for high-quality frame generation guided by textual descriptions.
Exploring broader applications beyond cooking, including a Tesla car advertisement, a Batman vs. Ironman movie trailer, and a Netflix-style animal documentary trailer.
Our Long Narrative Video Director uses interleaved generated visual embeddings as conditions for Video Generation. This approach emphasizes visual embedding alignment throughout the generation process, resulting in more realistic and visually consistent cooking demonstrations. The generated content maintains better temporal coherence and visual consistency across steps.
This approach leverages generated captions to create keyframes using the SOTA Text-to-Image model (FLUX.1-Schnell). While this method produces more aesthetically pleasing individual frames, it may show less visual consistency between steps. The result emphasizes artistic quality while maintaining strong alignment with textual descriptions.
In this section, we also provide three more demos to show the potential of our pipeline to broader scenarios.
Tesla Car Advertisement
Batman vs. Ironman Movie Trailer
Netflix Animal TV Series Trailer
Tesla Car Product Video
Reference: https://www.youtube.com/watch?v=sV5MwVYQwS8
Inspired by BMW commercials, this AI-generated advertisement demonstrates our model's ability to analyze reference content and create original narratives. The entire process—from storyline development to keyframe generation and video clips—is AI-powered. The final video is enhanced using CapCut for dynamic pacing, with some sequences sped up or shortened for engagement. Only the audio and final logo clip are manually added, with basic editing taking approximately 30 minutes.
Batman vs. Ironman Movie Trailer
Reference Video
This demo explores the potential future direction of our model in generating long narrative movie trailers. While current capabilities are limited, this concept trailer illustrates our vision for multi-character interactions and style transfer between different franchises like DC and Marvel. We generate materials and manually select the best aligned clips with high-quality motions and instruction alignment for narrative. The final video is enhanced using CapCut for dynamic pacing, with some sequences sped up or shortened for engagement. Only the audio and final logo clip are manually added, with basic editing taking approximately 60 minutes. The video demonstrates the exciting possibilities for future developments in long-form narrative generation.
Netflix Animal TV Series Trailer
Reference: https://www.youtube.com/watch?v=YqcOaxJCZr8
This demo showcases our model's ability to generate nature documentary-style content inspired by Netflix's ANIMAL TV series. The AI system generates diverse wildlife scenes with natural behaviors and environments, demonstrating potential for educational and entertainment content creation. We curate the best-generated clips that show realistic animal movements and environmental interactions to construct a compelling narrative. The final video is enhanced using CapCut for professional pacing and transitions. Only the background music and narration are manually added, with basic editing taking approximately 45 minutes. This demo highlights the possibility of AI-assisted nature documentary production while maintaining the authentic feel of wildlife filmmaking.