Skip to content

PaperCodex

Subscribe
Meta-Transformer: One Unified Model for 12 Modalities—No Paired Data Needed

Meta-Transformer: One Unified Model for 12 Modalities—No Paired Data Needed 1644

In today’s AI landscape, building systems that understand multiple types of data—text, images, audio, video, time series, and more—is increasingly…

12/17/2025Foundation Model, Multimodal Learning, Representation Learning
MergeKit: Build Powerful, Multitask LLMs by Merging Models—No Retraining Needed

MergeKit: Build Powerful, Multitask LLMs by Merging Models—No Retraining Needed 6574

In today’s fast-moving landscape of open-source large language models (LLMs), developers and researchers are increasingly faced with a dilemma: dozens…

12/17/202512/17/2025Model Mergin
MedRAX: Unified AI Agent for Complex Chest X-ray Reasoning Without Retraining

MedRAX: Unified AI Agent for Complex Chest X-ray Reasoning Without Retraining 1048

In clinical radiology, interpreting chest X-rays (CXRs) demands more than just identifying abnormalities—it requires synthesizing visual findings, clinical context, patient…

12/17/2025Chest X-ray Interpretation, Medical Image Reasoning, Multimodal Clinical AI
HierSpeech++: Human-Level Zero-Shot Speech Synthesis with Fast Inference and High Fidelity

HierSpeech++: Human-Level Zero-Shot Speech Synthesis with Fast Inference and High Fidelity 1232

In the rapidly evolving field of speech synthesis, achieving natural-sounding, speaker-consistent voice generation without speaker-specific training data has long been…

12/17/2025Speech Super-Resolution, Voice Conversion, Zero-shot Text-to-Speech
FlashRAG: A Modular, Lightweight Toolkit for Reproducible and Efficient Retrieval-Augmented Generation Research

FlashRAG: A Modular, Lightweight Toolkit for Reproducible and Efficient Retrieval-Augmented Generation Research 3208

Retrieval-Augmented Generation (RAG) has emerged as a cornerstone technique for enhancing the factual grounding, knowledge scope, and reasoning capabilities of…

12/17/2025Multimodal RAG, Reasoning-Augmented QA, Retrieval-Augmented Generation
HunyuanVideo: Open-Source, High-Fidelity Video Generation That Rivals Closed Models

HunyuanVideo: Open-Source, High-Fidelity Video Generation That Rivals Closed Models 11437

HunyuanVideo is a groundbreaking open-source video foundation model developed by Tencent, designed to deliver professional-grade video generation capabilities without the…

12/17/2025Image-to-video Generation, Multimodal Video Synthesis, Text-to-Video Generation
FireRedASR: Industrial-Grade Mandarin Speech Recognition with SOTA Accuracy and LLM Integration

FireRedASR: Industrial-Grade Mandarin Speech Recognition with SOTA Accuracy and LLM Integration 1658

FireRedASR is an open-source, industrial-grade automatic speech recognition (ASR) system specifically engineered for Mandarin Chinese—but with strong capabilities in Chinese…

12/17/2025Automatic Speech Recognition, LLM-Integrated Speech Processing, Multilingual ASR
UltraRAG: Build Adaptive, Multimodal RAG Systems Without Writing Complex Code

UltraRAG: Build Adaptive, Multimodal RAG Systems Without Writing Complex Code 2325

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for grounding large language models (LLMs) in real-world knowledge. However, building effective…

12/16/2025Adaptive Knowledge Integration, Multimodal Reasoning, Retrieval-Augmented Generation
VLMEvalKit: One-Command Evaluation for 200+ Vision-Language Models Across 80+ Benchmarks

VLMEvalKit: One-Command Evaluation for 200+ Vision-Language Models Across 80+ Benchmarks 3536

Evaluating large vision-language models (LVLMs) used to be a fragmented, time-consuming chore—juggling dozens of benchmark repositories, writing custom data loaders,…

12/16/2025Benchmarking, Multi-modal Evaluation, vision-language modeling
HunFlair: State-of-the-Art Biomedical Named Entity Recognition with Just Four Lines of Code

HunFlair: State-of-the-Art Biomedical Named Entity Recognition with Just Four Lines of Code 14333

Biomedical text is dense with critical information—gene names, chemical compounds, diseases, species—but extracting that information manually is time-consuming and error-prone.…

12/15/2025Biomedical Text Mining, Named Entity Recognition, Sequence Labeling

Posts pagination

Previous 1 … 48 49 50 … 53 Next
Copyright © 2026 PaperCodex.
  • Facebook
  • YouTube
  • Twitter

PaperCodex