PaperCodex

Steel-LLM: A High-Performance, Fully Transparent 1B Chinese LLM Built from Scratch for Resource-Constrained Teams 768

Steel-LLM is a 1-billion-parameter open-source language model developed entirely from scratch with a strong focus on Chinese language understanding and…

01/13/2026Chinese Language Modeling, Instruction Fine-tuning, Open-source LLM

LLMDet: Open-Vocabulary Object Detection Powered by Large Language Models for Real-World Flexibility 518

Imagine building a vision system that can detect not just pre-defined classes like “car” or “dog,” but any object described…

01/13/2026Open-vocabulary Object Detection, vision-language modeling, Zero-shot Object Detection

FaceBoxes: Real-Time CPU Face Detection with High Accuracy and Scale Invariance 597

In many real-world applications—ranging from video conferencing and surveillance to edge-based biometric systems—face detection must run quickly, reliably, and without…

01/13/2026Edge AI, Face Detection, Real-time Object Detection

LiteFlowNet: High-Accuracy Optical Flow Estimation with a Lightweight, Fast CNN for Real-World Applications 623

Optical flow estimation—the task of predicting per-pixel motion between consecutive video frames—is foundational in computer vision applications ranging from autonomous…

01/13/2026Motion Analysis, Optical Flow Estimation, Video Understanding

FCHD: Fast, Accurate Head Detection for Crowded Scenes Without Heavy Compute 646

Detecting human heads in dense, real-world environments—like subway platforms, concerts, or retail stores—is a surprisingly tough problem in computer vision.…

01/13/2026Crowd Analysis, Head Detection, Object Detection

PytorchInsight: Boost CNN Performance with Lightweight, Plug-and-Play Attention Modules for Vision Tasks 871

PytorchInsight is a practical, research-oriented PyTorch library designed to accelerate deep learning development—especially for computer vision practitioners who need reliable,…

01/13/2026CNN Attention Mechanisms, Image Classification, Object Detection

Pixel-in-Pixel Net: Fast, Accurate Facial Landmark Detection for Real-World Applications 548

Facial landmark detection—the task of locating key points on a human face like eyes, nose, and mouth—powers countless applications, from…

01/13/2026Cross-domain Generalization, Facial Landmark Detection, Real-time Face Tracking

AANet: Real-Time Stereo Matching Without 3D Convolutions for Autonomous Systems and Embedded Vision 548

Stereo matching—the task of estimating depth from a pair of rectified images—is foundational in applications like autonomous driving, robotics, and…

01/13/2026Depth Estimation, Efficient Deep Learning, Stereo Matching

HandRefiner: Fix AI-Generated Hand Errors Without Retraining Your Model 801

Diffusion models like Stable Diffusion and SDXL have revolutionized AI image generation—but they still stumble on one persistent, high-visibility flaw:…

01/13/2026Conditional Generation, Hand Pose Correction, Image Inpainting

OpenOCR: High-Accuracy, Efficient OCR for English and Chinese Text in Real-World Applications 797

OpenOCR is a general-purpose optical character recognition (OCR) system developed by the FVL Laboratory at Fudan University. Designed with both…

01/13/2026Multilingual OCR, Optical Character Recognition, Scene Text Recognition