Steel-LLM is a 1-billion-parameter open-source language model developed entirely from scratch with a strong focus on Chinese language understanding and…
LLMDet: Open-Vocabulary Object Detection Powered by Large Language Models for Real-World Flexibility 518
Imagine building a vision system that can detect not just pre-defined classes like “car” or “dog,” but any object described…
FaceBoxes: Real-Time CPU Face Detection with High Accuracy and Scale Invariance 597
In many real-world applications—ranging from video conferencing and surveillance to edge-based biometric systems—face detection must run quickly, reliably, and without…
LiteFlowNet: High-Accuracy Optical Flow Estimation with a Lightweight, Fast CNN for Real-World Applications 623
Optical flow estimation—the task of predicting per-pixel motion between consecutive video frames—is foundational in computer vision applications ranging from autonomous…
FCHD: Fast, Accurate Head Detection for Crowded Scenes Without Heavy Compute 646
Detecting human heads in dense, real-world environments—like subway platforms, concerts, or retail stores—is a surprisingly tough problem in computer vision.…
PytorchInsight: Boost CNN Performance with Lightweight, Plug-and-Play Attention Modules for Vision Tasks 871
PytorchInsight is a practical, research-oriented PyTorch library designed to accelerate deep learning development—especially for computer vision practitioners who need reliable,…
Pixel-in-Pixel Net: Fast, Accurate Facial Landmark Detection for Real-World Applications 548
Facial landmark detection—the task of locating key points on a human face like eyes, nose, and mouth—powers countless applications, from…
AANet: Real-Time Stereo Matching Without 3D Convolutions for Autonomous Systems and Embedded Vision 548
Stereo matching—the task of estimating depth from a pair of rectified images—is foundational in applications like autonomous driving, robotics, and…
HandRefiner: Fix AI-Generated Hand Errors Without Retraining Your Model 801
Diffusion models like Stable Diffusion and SDXL have revolutionized AI image generation—but they still stumble on one persistent, high-visibility flaw:…
OpenOCR: High-Accuracy, Efficient OCR for English and Chinese Text in Real-World Applications 797
OpenOCR is a general-purpose optical character recognition (OCR) system developed by the FVL Laboratory at Fudan University. Designed with both…