Building and Researching a Machine Learning Model for Identifying Corporate Tax Avoidance
DOI:
https://doi.org/10.71465/fbf399Keywords:
Tax Avoidance Detection, Machine Learning, Corporate Governance, Predictive ModelingAbstract
Corporate tax avoidance poses significant challenges to public finance and economic equity, yet its detection remains complex due to the nuanced nature of financial and non-financial corporate data. This study aims to develop and evaluate a machine learning model capable of accurately identifying corporate tax avoidance behaviors using a multi-dimensional dataset. The methodology incorporates financial ratios, ownership structures, and industry characteristics as predictive features, employing gradient boosting algorithms to classify firms based on their likelihood of engaging in tax avoidance. The model was trained and validated on a global dataset comprising over 10,000 publicly listed companies from 2000 to 2020. Key findings indicate that the model achieves an F1-score of 0.87, significantly outperforming traditional logistic regression benchmarks. Additionally, feature importance analysis reveals that profitability metrics, subsidiary networks, and jurisdictional attributes are among the most influential predictors. These results underscore the potential of machine learning to enhance regulatory oversight and inform policy-making by enabling early and precise identification of tax avoidance practices.  
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Zhipeng Wang, Yufei Chen (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
 
							