PaperCodex

MegActor: Realistic Portrait Animation from Raw Video Without Identity Leakage or Background Noise 900

MegActor represents a significant leap forward in portrait animation by directly leveraging raw driving videos—rather than simplified proxies like facial…

01/13/2026Facial Expression Transfer, Multimodal Diffusion Models, Portrait Animation

Chat-UniVi: One Unified Model for Image and Video Understanding—No More Separate Systems Needed 939

In today’s AI landscape, multimodal systems that understand both images and videos are increasingly essential—but most solutions force you to…

01/13/2026Multimodal Understanding, Video Question Answering, Visual Reasoning

TF-ICON: Training-Free Cross-Domain Image Composition Without Retraining or Fine-Tuning 821

Image composition—seamlessly inserting a user-provided object into a new visual context—is a long-standing challenge in computer vision and generative AI.…

01/13/2026Cross-domain Image Generation, Diffusion Model Editing, Image Composition

$FacTool: Automatically Detect Factual Errors in LLM Outputs Across Code, Math, QA, and Scientific Writing$

FacTool: Automatically Detect Factual Errors in LLM Outputs Across Code, Math, QA, and Scientific Writing 899

Large language models (LLMs) like ChatGPT and GPT-4 have transformed how we generate text, write code, solve math problems, and…

01/13/2026Factuality Detection, Hallucination Detection, LLM Evaluation

HQTrack: High-Quality Video Object Tracking and Segmentation Without Complex Tricks 752

HQTrack is a powerful and practical framework designed to solve a persistent challenge in computer vision: accurately tracking and segmenting…

01/13/2026Instance Segmentation, Multi-Object Tracking, Video Object Segmentation

GraphGPT: Enable LLMs to Understand Graphs Without Task-Specific Labels 781

In today’s data-driven world, much of the most valuable information isn’t stored in tables or plain text—it lives in relationships.…

01/13/2026Graph Reasoning, Link Prediction, Node Classification

DISC-FinLLM: A Specialized Chinese Financial LLM for Accurate, Context-Aware Financial Intelligence 818

If you’re building AI-powered tools for the Chinese financial sector—whether for banking, fintech, investment research, or regulatory compliance—you’ve likely run…

01/13/2026Financial Large Language Model, Financial Text Analysis, Retrieval-Augmented Generation

UniTS: One Model to Rule All Time Series Tasks – Forecasting, Classification, Imputation & Anomaly Detection Unified 570

Time series data is everywhere—in financial markets, wearable health monitors, industrial sensors, and smart infrastructure. Yet, despite its ubiquity, building…

01/13/2026Anomaly Detection, Time Series Classification, Time-series Forecasting

RAG Foundry (RAG-FiT): Build, Train, and Evaluate Domain-Specific RAG Systems Without the Complexity 750

Building effective Retrieval-Augmented Generation (RAG) systems is notoriously difficult. Practitioners must juggle data preparation, retrieval integration, prompt engineering, model fine-tuning,…

01/13/2026Parameter-Efficient Fine-Tuning, RAG Evaluation, Retrieval-Augmented Generation

Agentic Reasoning: Supercharge LLM Reasoning with Tool-Augmented Deep Research and Structured Memory 699

Large language models (LLMs) have made remarkable strides in generating coherent, context-aware responses. Yet, when it comes to complex, multi-step…

01/13/2026Deep Research, Structured Reasoning, Tool-augmented Reasoning