Time series forecasting powers critical decisions across industries—from predicting electricity demand and traffic congestion to estimating disease spread and stock…
CKnowEdit: Fix Chinese Linguistic, Factual & Logical Errors in LLMs Without Retraining 2667
Large language models (LLMs) have made remarkable progress in multilingual understanding—but their performance in Chinese remains uneven, especially when it…
FastViT: Achieve State-of-the-Art Speed and Accuracy for Vision Tasks on Mobile and Edge Devices 1974
FastViT is a high-performance hybrid vision transformer designed to deliver exceptional speed and accuracy—especially on resource-constrained platforms like mobile phones…
iTransformer: Invert Your Time Series Forecasting Architecture for Better Scalability, Generalization, and Simplicity 1824
Time series forecasting is a foundational task across finance, energy, logistics, and digital platforms—yet traditional Transformer-based models often struggle with…
InternLM-XComposer: Generate Rich Text-Image Content and Understand High-Res Visuals with Open, Commercially Free AI 2909
Overview For technical decision makers evaluating multimodal AI, choosing between closed-source APIs and open alternatives often means trading off control,…
Show-1: High-Quality, Efficient Text-to-Video Generation with Precise Prompt Alignment 1133
Text-to-video generation has rapidly evolved, yet technical teams still face a persistent trade-off: high-quality outputs often come at prohibitive computational…
TinyLlama: A Fast, Efficient 1.1B Open Language Model for Edge Deployment and Speculative Decoding 8770
TinyLlama is a compact yet powerful open-source language model with just 1.1 billion parameters—but trained on an impressive 3 trillion…
SLAM3R: Real-Time Dense 3D Reconstruction from Monocular Video—No Camera Calibration Needed 1045
Introducing SLAM3R—a cutting-edge, end-to-end system that reconstructs high-quality, dense 3D scenes in real time using only a monocular RGB video…
3DGUT: Real-Time 3D Reconstruction That Handles Distorted Cameras and Reflections Without Sacrificing Speed 1743
3D Gaussian Splatting (3DGS) revolutionized real-time 3D scene reconstruction by delivering photorealistic quality at high frame rates on consumer GPUs.…
LLaVA-CoT: Step-by-Step Visual Reasoning for Reliable, Explainable Multimodal AI 2108
Most vision-language models (VLMs) today can describe what’s in an image—but they often falter when asked to reason about it.…