PaperCodex

Sa2VA: Unified Vision-Language Model for Accurate Referring Video Object Segmentation from Natural Language 1455

Sa2VA represents a significant leap forward in multimodal AI by seamlessly integrating the strengths of SAM2—Meta’s state-of-the-art video object segmentation…

12/27/2025Multimodal Grounding, Referring Video Object Segmentation, vision-language modeling

Classiq: Accelerate Quantum Algorithm Development with High-Level Abstraction and Automated Circuit Synthesis 1946

Quantum computing holds immense promise—but building, optimizing, and executing quantum circuits remains a formidable challenge for most developers, researchers, and…

12/27/2025Quantum Algorithm Design, Quantum Machine Learning, Quantum State Preparation

Loghub: Real-World System Log Datasets to Power AI-Driven Log Analytics and Research 2448

In the world of software systems—whether they’re cloud-native applications, distributed infrastructures, or legacy enterprise platforms—logs are the lifeblood of observability.…

12/26/2025Anomaly Detection, Failure Prediction, Log Parsing

Mini-Omni2: Unified Vision, Speech, and Text Interaction Without External ASR/TTS Pipelines 1847

In today’s open-source AI landscape, building truly multimodal applications often means stitching together separate models for vision, speech recognition (ASR),…

12/26/2025End-to-end Voice Assistant, Multimodal Understanding, Speech-to-speech Interaction

DeepCode: Turn Research Papers and Text into Production-Ready Code—Faster Than Human Experts 12706

Imagine being able to feed a research paper, a technical specification, or even a rough product description into a system—and…

12/26/2025Agentic AI, Code Generation, Research Reproduction

aiXcoder-7B: High-Accuracy Code Completion in a Lightweight 7B Model for Real-Time Developer Workflows 2274

aiXcoder-7B is a 7-billion-parameter open-source large language model (LLM) purpose-built for code processing. Unlike larger models that trade inference speed…

12/26/2025Code Completion, Code Generation, Fill-in-the-middle

Mini-Omni: Real-Time, End-to-End Speech AI Without ASR or TTS Latency 3492

In today’s landscape of conversational AI, most voice-enabled systems rely on a pipeline of separate components: automatic speech recognition (ASR)…

12/26/2025End-to-end Voice Interaction, Real-time Conversational AI, Speech-to-speech Synthesis

Puppeteer: Dynamic Multi-Agent Orchestration for Efficient, Adaptive LLM Collaboration 27888

Managing complex tasks with large language models (LLMs) often hits a ceiling: while single models excel at narrow tasks, scaling…

12/26/2025Dynamic Orchestration, Multi-agent Systems, Reinforcement Learning For LLMs

Elixir: Train Large Language Models Efficiently on Small GPU Clusters Without Expert-Level Tuning 41294

Training large language models (LLMs) has traditionally been the domain of well-resourced AI labs with access to massive GPU clusters…

12/26/2025Distributed Deep Learning, Large Language Model Training, Memory-efficient Training

UniLM: One Model for Both Understanding and Generating Natural Language 21874

In the evolving landscape of natural language processing (NLP), teams often find themselves juggling separate models—one for understanding tasks like…

12/26/2025Natural Language Generation, Natural Language Understanding, Sequence-to-sequence Modeling