Exploring Clustering Algorithms for Customer Segmentation in Big Data Analytics

Zhanghua Zhu

doi:10.71465/fair340

Authors

Zhanghua Zhu School of Information Management, Wuhan University, Wuhan 430072, China Author

DOI:

https://doi.org/10.71465/fair340

Keywords:

Big Data Analytics, Customer Segmentation, Clustering Algorithms, K-Means, DBSCAN, BIRCH

Abstract

In the contemporary digital economy, harnessing big data for customer segmentation has transitioned from a competitive advantage to a strategic necessity. While the volume, velocity, and variety of customer data offer unprecedented opportunities for personalization, they also pose significant computational and analytical challenges to traditional data mining techniques. Clustering, as a fundamental unsupervised learning method, remains central to segmentation, yet standard algorithms often fail to scale efficiently or accurately capture the complex structures inherent in massive datasets. This study provides a comprehensive exploration and comparative analysis of foundational clustering algorithms—specifically the partitioning method (K-Means), the density-based method (DBSCAN), and the scalable hierarchical method (BIRCH)—applied to the task of customer segmentation within a simulated big data environment. This empirical investigation utilizes a large-scale transactional dataset, focusing on feature engineering based on the Recency, Frequency, Monetary value, and Variety (RFM-V) model. Algorithm performance is systematically evaluated using internal validation metrics, including the Silhouette Coefficient and the Davies-Bouldin Index, alongside a critical assessment of computational efficiency (processing time). Our findings demonstrate that while K-Means provides a rapid baseline, it struggles with non-spherical data structures, resulting in suboptimal segment quality. Conversely, DBSCAN proves computationally intractable at scale, despite its theoretical superiority in handling noise and arbitrary cluster shapes. The study concludes that BIRCH presents the most viable solution, offering a robust balance between computational scalability and the generation of coherent, meaningful customer segments, thereby addressing the central challenge of applying unsupervised learning to big data analytics.

Downloads

Download data is not yet available.

Exploring Clustering Algorithms for Customer Segmentation in Big Data Analytics

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Journal Information

Journal Name : Frontiers in Artificial Intelligence Research

Latest publications

Information

Make a Submission

Keywords