Skip to content

PaperCodex

Subscribe

Multimodal Foundation Models

Step-Video-T2V: Generate High-Quality, Long-Form Videos from Text in English and Chinese

Step-Video-T2V: Generate High-Quality, Long-Form Videos from Text in English and Chinese 3139

Step-Video-T2V is a state-of-the-art open-source text-to-video foundation model developed by StepFun AI. With 30 billion parameters and the ability to…

12/27/2025Multimodal Foundation Models, Text-to-Video Generation, Video Diffusion Models
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex