Exploring Clustering Algorithms for Customer Segmentation in Big Data Analytics
DOI:
https://doi.org/10.71465/fair340Keywords:
Big Data Analytics, Customer Segmentation, Clustering Algorithms, K-Means, DBSCAN, BIRCHAbstract
In the contemporary digital economy, harnessing big data for customer segmentation has transitioned from a competitive advantage to a strategic necessity. While the volume, velocity, and variety of customer data offer unprecedented opportunities for personalization, they also pose significant computational and analytical challenges to traditional data mining techniques. Clustering, as a fundamental unsupervised learning method, remains central to segmentation, yet standard algorithms often fail to scale efficiently or accurately capture the complex structures inherent in massive datasets. This study provides a comprehensive exploration and comparative analysis of foundational clustering algorithms—specifically the partitioning method (K-Means), the density-based method (DBSCAN), and the scalable hierarchical method (BIRCH)—applied to the task of customer segmentation within a simulated big data environment. This empirical investigation utilizes a large-scale transactional dataset, focusing on feature engineering based on the Recency, Frequency, Monetary value, and Variety (RFM-V) model. Algorithm performance is systematically evaluated using internal validation metrics, including the Silhouette Coefficient and the Davies-Bouldin Index, alongside a critical assessment of computational efficiency (processing time). Our findings demonstrate that while K-Means provides a rapid baseline, it struggles with non-spherical data structures, resulting in suboptimal segment quality. Conversely, DBSCAN proves computationally intractable at scale, despite its theoretical superiority in handling noise and arbitrary cluster shapes. The study concludes that BIRCH presents the most viable solution, offering a robust balance between computational scalability and the generation of coherent, meaningful customer segments, thereby addressing the central challenge of applying unsupervised learning to big data analytics.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Zhanghua Zhu (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.