PaperCodex

RFBNet: High-Accuracy, Real-Time Object Detection Without Heavy Backbones 1422

When building real-world computer vision systems—whether for autonomous drones, industrial inspection, or mobile apps—one of the toughest trade-offs is between…

12/27/2025Edge AI, Object Detection, Real-Time Inference

3DDFA_V2: Real-Time, CPU-Efficient 3D Face Alignment for Video and Edge Applications 3081

If you’re building applications that require real-time 3D facial understanding—like video conferencing enhancements, augmented reality filters, biometric verification, or character…

12/27/20253D Face Alignment, Dense Facial Landmark Estimation, Real-time Face Tracking

Bunny: High-Performance Multimodal AI Without the Heavy Compute Burden 1046

Multimodal Large Language Models (MLLMs) are transforming how machines understand and reason about visual content. Yet, their adoption remains out…

12/27/2025Efficient Inference, Multimodal Reasoning, vision-language modeling

Step-Video-T2V: Generate High-Quality, Long-Form Videos from Text in English and Chinese 3139

Step-Video-T2V is a state-of-the-art open-source text-to-video foundation model developed by StepFun AI. With 30 billion parameters and the ability to…

12/27/2025Multimodal Foundation Models, Text-to-Video Generation, Video Diffusion Models

GCNet: Boost Vision Models with Lightweight Global Context for Better Accuracy and Efficiency 1217

If you’ve worked on computer vision tasks like object detection or instance segmentation, you’ve likely encountered the challenge of modeling…

12/27/2025Global Context Modeling, Instance Segmentation, Object Detection

GCOPTER: Real-Time, High-Fidelity Multicopter Trajectory Planning with Geometric and Dynamic Constraints 1105

Autonomous multicopters—whether used in drone racing, delivery, inspection, or swarm coordination—face a persistent challenge: generating trajectories that are simultaneously smooth,…

12/27/2025Aerial Robotics, Motion Planning, Trajectory Optimization

LightningDiT: Break the Reconstruction-Generation Trade-Off with 21.8x Faster, SOTA Image Diffusion 1315

Latent diffusion models (LDMs) have become a cornerstone of modern high-fidelity image generation. However, a persistent challenge has limited their…

12/27/2025Diffusion Transformers, Image Generation, Latent Diffusion Models

PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed 1783

If you’re working to improve large language models (LLMs) on hard reasoning tasks—like math problem solving or competitive programming—you’ve likely…

12/27/2025Code Generation, Mathematical Reasoning, Reinforcement Learning

GANformer: Compositional, Controllable Image Generation with Fewer Training Steps 1342

Traditional generative adversarial networks (GANs) often act as “black boxes”—they produce compelling images but offer little insight into how those…

12/27/2025Compositional Scene Modeling, Image Generation, Layout-to-image Synthesis

FlagEmbedding: High-Performance, Task-Aware Text Embeddings for Multilingual RAG and Semantic Search 10677

Modern AI applications—from customer support chatbots to enterprise knowledge retrieval—rely heavily on high-quality text embeddings to power semantic search and…

12/27/2025Retrieval-Augmented Generation, Semantic Search, Text Embedding