If you’ve worked on computer vision tasks like object detection or instance segmentation, you’ve likely encountered the challenge of modeling…
GCOPTER: Real-Time, High-Fidelity Multicopter Trajectory Planning with Geometric and Dynamic Constraints 1105
Autonomous multicopters—whether used in drone racing, delivery, inspection, or swarm coordination—face a persistent challenge: generating trajectories that are simultaneously smooth,…
LightningDiT: Break the Reconstruction-Generation Trade-Off with 21.8x Faster, SOTA Image Diffusion 1315
Latent diffusion models (LDMs) have become a cornerstone of modern high-fidelity image generation. However, a persistent challenge has limited their…
PRIME: Boost LLM Reasoning with Token-Level Rewards—No Step-by-Step Labels Needed 1783
If you’re working to improve large language models (LLMs) on hard reasoning tasks—like math problem solving or competitive programming—you’ve likely…
GANformer: Compositional, Controllable Image Generation with Fewer Training Steps 1342
Traditional generative adversarial networks (GANs) often act as “black boxes”—they produce compelling images but offer little insight into how those…
FlagEmbedding: High-Performance, Task-Aware Text Embeddings for Multilingual RAG and Semantic Search 10677
Modern AI applications—from customer support chatbots to enterprise knowledge retrieval—rely heavily on high-quality text embeddings to power semantic search and…
MacNet: Scale Multi-Agent LLM Collaboration Beyond Linear Workflows with Custom Topologies 27867
Traditional multi-agent systems powered by large language models (LLMs) often follow rigid, sequential workflows—like a single assembly line where each…
Tree of Thoughts: Unlock Strategic Reasoning in LLMs for Complex Problem Solving 5714
Large language models (LLMs) have transformed how we approach tasks ranging from coding assistance to content generation. Yet, their standard…
RPG-DiffusionMaster: Generate Complex, Compositional Images from Text—No Retraining Needed 1823
Text-to-image generation has made remarkable strides, yet even state-of-the-art models like DALL·E 3 or Stable Diffusion XL (SDXL) often stumble…
InternVideo: Build Powerful Video-Language AI Without Massive Compute or Data 2131
Building capable video-language AI systems has long been a resource-intensive endeavor—requiring vast video datasets, weeks of training on dozens of…