PaperCodex

AgentVerse: Build Collaborative LLM Agent Teams for Real Tasks or Behavioral Simulation 4884

In today’s AI landscape, single-agent systems—powered by large language models (LLMs)—often hit a ceiling when tackling complex, multi-step problems. What…

12/19/2025Behavioral Simulation, Multi-agent Collaboration, Task Automation

Align Anything: The First Open Framework for Aligning Any-to-Any Multimodal Models with Human Intent 4562

As AI systems grow more capable across diverse data types—text, images, audio, and video—the challenge of aligning them with human…

12/19/2025Instruction Tuning, Multimodal Alignment, Reinforcement Learning From Human Feedback

SPHINX-X: Build Scalable Multimodal AI Faster with Unified Training, Diverse Data, and Flexible Model Sizes 2794

SPHINX-X is a next-generation family of Multimodal Large Language Models (MLLMs) designed to streamline the development, training, and deployment of…

12/19/2025Document Intelligence, Multimodal Understanding, vision-language modeling

Xorbits: Scale Pandas and NumPy Workflows to Clusters—With Just One Line of Code 1199

Data scientists and machine learning engineers routinely rely on pandas and NumPy for data wrangling, exploration, and modeling. These libraries…

12/19/2025Data Preprocessing, Distributed Computing, Scalable Machine Learning

DyVal: Dynamic, Contamination-Free Evaluation of LLM Reasoning Capabilities 2726

Evaluating large language models (LLMs) has become increasingly challenging. Traditional benchmarks—like MMLU, GSM8K, or Big-Bench Hard—are static, fixed in complexity,…

12/19/2025Dynamic Benchmarking, LLM Robustness Testing, Reasoning Evaluation

Caption Anything: Interactive, Multimodal Image Captioning Controlled by You 1770

Traditional image captioning systems produce static, one-size-fits-all descriptions—often generic, inflexible, and disconnected from actual user intent. What if you could…

12/19/2025Image Captioning, Multimodal Control, vision-language modeling

OmniParser V2: One Unified Model for Text Spotting, Table Recognition, and Document Understanding 1800

In today’s data-driven world, businesses and researchers routinely process documents—scanned invoices, forms, tables, and receipts—to extract structured information. Traditionally, this…

12/19/2025Document Understanding, Multimodal Document Processing, Visual Text Parsing

ManimML: Animate Machine Learning Architectures Directly from Code—No Design Skills Needed 3269

As machine learning models grow increasingly complex—from deep convolutional networks to attention-based architectures—the ability to clearly communicate how they work…

12/19/2025Educational Animation, Model Explanation, Neural Network Visualization

Code-Optimise: Boost Code Correctness and Runtime Efficiency Without Trade-offs 2692

Modern code language models (CLMs) excel at generating functionally correct programs—but often at the cost of runtime efficiency. Conversely, efforts…

12/19/2025Code Generation, Model Optimization, Preference-based Learning

FederatedScope-LLM: Collaboratively Fine-Tune Large Language Models Without Sharing Private Data 1491

In today’s data-sensitive world, organizations increasingly want to harness the power of large language models (LLMs) while complying with strict…

12/19/2025Federated Learning, Parameter-Efficient Fine-Tuning, Privacy-Preserving NLP