In today’s AI landscape, building systems that understand multiple types of data—text, images, audio, video, time series, and more—is increasingly…
MergeKit: Build Powerful, Multitask LLMs by Merging Models—No Retraining Needed 6574
In today’s fast-moving landscape of open-source large language models (LLMs), developers and researchers are increasingly faced with a dilemma: dozens…
MedRAX: Unified AI Agent for Complex Chest X-ray Reasoning Without Retraining 1048
In clinical radiology, interpreting chest X-rays (CXRs) demands more than just identifying abnormalities—it requires synthesizing visual findings, clinical context, patient…
HierSpeech++: Human-Level Zero-Shot Speech Synthesis with Fast Inference and High Fidelity 1232
In the rapidly evolving field of speech synthesis, achieving natural-sounding, speaker-consistent voice generation without speaker-specific training data has long been…
FlashRAG: A Modular, Lightweight Toolkit for Reproducible and Efficient Retrieval-Augmented Generation Research 3208
Retrieval-Augmented Generation (RAG) has emerged as a cornerstone technique for enhancing the factual grounding, knowledge scope, and reasoning capabilities of…
HunyuanVideo: Open-Source, High-Fidelity Video Generation That Rivals Closed Models 11437
HunyuanVideo is a groundbreaking open-source video foundation model developed by Tencent, designed to deliver professional-grade video generation capabilities without the…
FireRedASR: Industrial-Grade Mandarin Speech Recognition with SOTA Accuracy and LLM Integration 1658
FireRedASR is an open-source, industrial-grade automatic speech recognition (ASR) system specifically engineered for Mandarin Chinese—but with strong capabilities in Chinese…
UltraRAG: Build Adaptive, Multimodal RAG Systems Without Writing Complex Code 2325
Retrieval-Augmented Generation (RAG) has become a cornerstone technique for grounding large language models (LLMs) in real-world knowledge. However, building effective…
VLMEvalKit: One-Command Evaluation for 200+ Vision-Language Models Across 80+ Benchmarks 3536
Evaluating large vision-language models (LVLMs) used to be a fragmented, time-consuming chore—juggling dozens of benchmark repositories, writing custom data loaders,…
HunFlair: State-of-the-Art Biomedical Named Entity Recognition with Just Four Lines of Code 14333
Biomedical text is dense with critical information—gene names, chemical compounds, diseases, species—but extracting that information manually is time-consuming and error-prone.…