Vision-Based Crack Segmentation with Geometry-Constrained Transformers for Field Concrete Inspection
DOI:
https://doi.org/10.71465/fess561Keywords:
Crack Segmentation, Vision Transformers, Structural Health Monitoring, Geometric Constraints, Deep Learning.Abstract
The structural integrity of concrete infrastructure is paramount to public safety and economic stability. Automated pavement and surface inspection via computer vision has emerged as a critical alternative to labor-intensive manual surveys. However, traditional Convolutional Neural Networks (CNNs) often struggle to preserve the high-frequency topological details of thin cracks against complex, texture-heavy heterogeneous backgrounds. While Vision Transformers (ViTs) offer superior global context modeling, they frequently lack the inductive biases required to capture the fine-grained local geometry inherent to fracture mechanics. This paper proposes a novel architecture: the Geometry-Constrained Transformer (GCT). By integrating a dedicated geometric edge-alignment module within a hierarchical Transformer encoder-decoder structure, we explicitly enforce curvilinear continuity and boundary sharpness during the segmentation process. We introduce a dual-stream attention mechanism that leverages low-level morphological cues to guide high-level semantic tokens, ensuring that the global attention map remains anchored to physical structural defects. Extensive experiments on three public benchmark datasets demonstrate that the proposed GCT outperforms state-of-the-art CNN-based and Transformer-based methods, particularly in scenarios characterized by varying illumination, shadowing, and biological staining.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.