Assessment of Perception Accuracy in Autonomous Driving under Complex Urban Conditions with Sensor Integration
DOI:
https://doi.org/10.71465/fair259Keywords:
Multimodal fusion, BEV representation, Transformer-based model, 3D object detection, autonomous driving perceptionAbstract
Dynamic and complex traffic scenarios in urban environments impose stringent requirements on the perception capability of autonomous driving systems. In this study, we develop a perception model that integrates data from LiDAR, cameras and millimeter-wave radar through multimodal sensor fusion and employs a large-scale Transformer-based architecture. By adopting the Bird’s Eye View (BEV) representation and a multi-scale feature enhancement mechanism, the proposed model significantly improves the accuracy of 3D object detection and semantic interpretation. At the architectural level, we introduce a cross-modal attention mechanism and a sparse attention module, which enhance the model’s perception performance in challenging situations such as occlusion, drastic lighting changes, and densely clustered targets. Experiments on the nuScenes and KITTI datasets show that the proposed model outperforms existing approaches such as BEVFormer and VoxelNet in terms of mean Average Precision (mAP), Intersection over Union (IoU), and stability. The model consistently achieves high recognition accuracy and robust adaptability across various urban driving scenarios.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.