StrongSORT: A High-Performance, Plug-and-Play Multi-Object Tracker for Real-World Video Applications

Paper & Code

2023 • open-mmlab/mmtracking

★3832

Multi-object tracking (MOT) is a cornerstone of modern computer vision systems—powering everything from autonomous vehicles to retail analytics and security surveillance. Yet, building a reliable, accurate, and efficient MOT pipeline remains challenging due to inconsistent baselines, fragmented tooling, and complex dependencies between detection, embedding, and association modules.

Enter StrongSORT: a modern re-engineering of the classic DeepSORT tracker that delivers state-of-the-art performance while maintaining simplicity, fairness, and ease of integration. Developed as part of the OpenMMLab ecosystem and available in the MMTracking toolbox, StrongSORT isn’t just another academic novelty—it’s a practical, production-ready solution designed for engineers and researchers who need robust object tracking without unnecessary complexity.

What makes StrongSORT stand out is its commitment to being a strong and fair baseline. By upgrading core components of DeepSORT—object detection, appearance embedding, and trajectory association—with modern best practices, StrongSORT achieves significantly higher accuracy while remaining transparent and reproducible. Even better, it supports two lightweight, optional enhancements—AFLink and GSI—that address two persistent MOT pain points: broken trajectories and missing detections.

Whether you’re evaluating trackers for a robotics project, deploying surveillance software, or benchmarking algorithms for research, StrongSORT offers a compelling blend of performance, modularity, and ease of use.

Why StrongSORT Matters

A Modern Revival of a Classic Tracker

DeepSORT has long been a go-to baseline in MOT due to its simplicity and effectiveness. But as detection models and feature extractors have evolved, the original DeepSORT pipeline has become outdated—often bottlenecked by suboptimal components.

StrongSORT fixes this by integrating:

State-of-the-art object detectors (e.g., from MMDetection)
High-quality appearance embeddings trained on large-scale re-identification datasets
Improved Kalman filter and association logic

The result? A tracker that maintains DeepSORT’s interpretability while delivering accuracy that rivals more complex methods.

Built for Fair Comparison—and Real Deployments

One of StrongSORT’s key contributions is its role as a reproducible, strong baseline. Many MOT papers report results using different detectors, training tricks, or evaluation protocols, making fair comparisons nearly impossible. StrongSORT standardizes these variables, enabling apples-to-apples benchmarking.

But its value isn’t limited to academia. Because it’s implemented in MMTracking—a modular, GPU-accelerated PyTorch toolbox—it’s also ready for real-world use. You can swap in your preferred detector, adjust tracking parameters via config files, and run inference at high speed without writing custom code.

Key Technical Improvements

Enhanced Object Detection Integration

StrongSORT leverages the full power of modern detectors through seamless integration with MMDetection. This means you can use any supported detector—from YOLO variants to Cascade R-CNN—simply by changing a configuration file. No re-engineering required.

Accurate detection is the foundation of reliable tracking. StrongSORT’s performance gains start here, ensuring fewer false positives and better localization from the outset.

Smarter Feature Embedding

Appearance features are critical for distinguishing between similar-looking objects (e.g., people in a crowd). StrongSORT uses high-performance re-identification (ReID) models that provide discriminative embeddings, reducing identity switches during occlusions or close encounters.

Robust Trajectory Association

The core tracking logic—matching detections to existing tracks—is refined to balance motion (via Kalman filtering) and appearance cues. This leads to more stable trajectories, especially in dense or cluttered scenes.

Plug-and-Play Enhancements: AFLink and GSI

Beyond core improvements, StrongSORT introduces two optional modules that solve long-standing MOT challenges with minimal overhead:

Appearance-Free Link (AFLink)

Problem: Short tracklets often break due to occlusion or detector failure, and re-linking them typically requires expensive appearance matching or complex graph optimization.

Solution: AFLink uses motion cues alone (no visual features) to globally reconnect broken tracks. It’s lightweight (~1.7 ms per image on MOT17) and plug-and-play—just add it to any tracker without retraining.

Gaussian-Smoothed Interpolation (GSI)

Problem: Detections can vanish temporarily (e.g., when an object is behind a pillar), creating gaps in trajectories.

Solution: GSI fills these gaps using Gaussian process regression to interpolate plausible object positions. It adds only ~7.1 ms per image and works out of the box.

Together, AFLink and GSI form StrongSORT++, which achieves top-tier results on MOT17, MOT20, DanceTrack, and KITTI—without sacrificing speed.

Ideal Use Cases

StrongSORT excels in scenarios where:

Multiple objects must be tracked simultaneously (e.g., pedestrians in urban traffic)
Occlusions and camera motion are common (e.g., sports analytics, retail stores)
Reliable identity consistency is critical (e.g., surveillance, behavioral analysis)
Integration simplicity matters (e.g., prototyping in research or deploying in edge systems via OpenMMLab’s modular design)

It’s particularly valuable when you need a drop-in replacement for DeepSORT that delivers significantly better performance with no code changes—just better configs and pre-trained models.

Getting Started with StrongSORT

StrongSORT is available in MMTracking, part of the OpenMMLab ecosystem. Key advantages for practitioners:

No custom coding needed: Define your pipeline via YAML config files
GPU-accelerated: All operations (detection, embedding, tracking) run on GPU
Pre-trained models: Official checkpoints for MOT17, MOT20, and more
Dataset support: Works out of the box with MOTChallenge, DanceTrack, KITTI, and CrowdHuman
Ecosystem synergy: Shares models, tools, and conventions with MMDetection, MMCV, and other OpenMMLab projects

Installation is straightforward via pip or conda, and tutorials (including Colab notebooks) guide you through your first tracking run in minutes.

Limitations and Considerations

While StrongSORT is powerful, it’s not a magic bullet:

Detector-dependent: Tracking quality is capped by your detector’s performance. A weak detector = weak tracking.
AFLink and GSI are optional: You’ll need to evaluate whether trajectory re-linking or interpolation is necessary for your use case. In clean, short videos, they may add unnecessary latency.
Not end-to-end trainable: StrongSORT follows a tracking-by-detection paradigm, so joint optimization of detection and tracking isn’t supported.

That said, for most practical applications—especially those prioritizing deployability and interpretability—these trade-offs are well worth it.

Summary

StrongSORT revitalizes DeepSORT with modern components, standardized practices, and two clever plug-and-play modules that solve real MOT pain points. It’s fast, accurate, easy to use, and backed by a mature open-source ecosystem.

If you’re building or evaluating a multi-object tracking system and need a reliable, high-performance baseline that works out of the box, StrongSORT is one of the strongest choices available today.