Reducing Annotation Cost in Vision Language Pedestrian Re Identification via Uncertainty Driven Sampling

Authors

  • Michael Anderson Department of Computer Science, Stanford University, Stanford, CA 94305, USA Author
  • Daniel Rodriguez Department of Computer Science, Stanford University, Stanford, CA 94305, USA Author
  • Yi Chen Department of Computer Science, Stanford University, Stanford, CA 94305, USA Author

DOI:

https://doi.org/10.71465/fair657

Keywords:

Active learning, annotation efficiency, pedestrian re-identification, uncertainty sampling, vision–language models

Abstract

Scaling pedestrian re-identification for autonomous driving is limited by the cost of identity labeling across large camera networks. Inspired by CLIP-based uncertainty modal modeling, this paper proposes an active learning approach that selects labeling candidates using uncertainty in the joint vision–language embedding space. The method combines (i) uncertainty sampling for ambiguous matches, (ii) diversity sampling based on embedding coverage, and (iii) batch acquisition with redundancy control. Experiments are conducted on a large-scale dataset with 400,000 images and 50,000 identities under incremental labeling budgets from 5% to 30%. Compared with random sampling, core-set selection, and margin-based acquisition using TransReID embeddings, the proposed strategy reaches 95% of full-supervision mAP using 18%–22% fewer labeled identities, while reducing annotation time by an estimated 20%–25% under standard labeling workflows.

Downloads

Download data is not yet available.

Downloads

Published

2026-02-24