Vision-Based Crack Segmentation with Geometry-Constrained Transformers for Field Concrete Inspection

Authors

  • Lei Zhang School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK Author

DOI:

https://doi.org/10.71465/fess561

Keywords:

Crack Segmentation, Vision Transformers, Structural Health Monitoring, Geometric Constraints, Deep Learning.

Abstract

The structural integrity of concrete infrastructure is paramount to public safety and economic stability. Automated pavement and surface inspection via computer vision has emerged as a critical alternative to labor-intensive manual surveys. However, traditional Convolutional Neural Networks (CNNs) often struggle to preserve the high-frequency topological details of thin cracks against complex, texture-heavy heterogeneous backgrounds. While Vision Transformers (ViTs) offer superior global context modeling, they frequently lack the inductive biases required to capture the fine-grained local geometry inherent to fracture mechanics. This paper proposes a novel architecture: the Geometry-Constrained Transformer (GCT). By integrating a dedicated geometric edge-alignment module within a hierarchical Transformer encoder-decoder structure, we explicitly enforce curvilinear continuity and boundary sharpness during the segmentation process. We introduce a dual-stream attention mechanism that leverages low-level morphological cues to guide high-level semantic tokens, ensuring that the global attention map remains anchored to physical structural defects. Extensive experiments on three public benchmark datasets demonstrate that the proposed GCT outperforms state-of-the-art CNN-based and Transformer-based methods, particularly in scenarios characterized by varying illumination, shadowing, and biological staining.

Downloads

Published

2026-01-01