Reducing Annotation Cost in Vision Language Pedestrian Re Identification via Uncertainty Driven Sampling
DOI:
https://doi.org/10.71465/fair657Keywords:
Active learning, annotation efficiency, pedestrian re-identification, uncertainty sampling, vision–language modelsAbstract
Scaling pedestrian re-identification for autonomous driving is limited by the cost of identity labeling across large camera networks. Inspired by CLIP-based uncertainty modal modeling, this paper proposes an active learning approach that selects labeling candidates using uncertainty in the joint vision–language embedding space. The method combines (i) uncertainty sampling for ambiguous matches, (ii) diversity sampling based on embedding coverage, and (iii) batch acquisition with redundancy control. Experiments are conducted on a large-scale dataset with 400,000 images and 50,000 identities under incremental labeling budgets from 5% to 30%. Compared with random sampling, core-set selection, and margin-based acquisition using TransReID embeddings, the proposed strategy reaches 95% of full-supervision mAP using 18%–22% fewer labeled identities, while reducing annotation time by an estimated 20%–25% under standard labeling workflows.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Michael Anderson, Daniel Rodriguez, Yi Chen (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.