Hadoop-based Online Shopping Behavior Analysis: Design and Implementation
DOI:
https://doi.org/10.71465/fair264Keywords:
Big Data Analytics, E-commerce User Behavior, Hadoop, Hive, Pyecharts, Dashboard Visualization, Core E-commerce MetricsAbstract
This research conducts big data analysis based on open-source Taobao user behavior data. Leveraging the Hadoop big data analytics platform, we performed multi-dimensional user behavior analysis on the publicly available Alibaba Tianchi dataset to provide actionable insights for e-commerce sales decisions.
The study utilizes open-source Taobao user behavior data, where each row represents an individual user action. The dataset was first uploaded to Hadoop's HDFS storage. Subsequently, we configured Hadoop's Flume component to automate data ingestion, loading the data into a Hive database for comprehensive analysis. Key e-commerce metrics—including PV (Page Views), UV (Unique Visitors), bounce rate, and repurchase rate—were statistically analyzed. A multi-dimensional perspective was applied to examine user behavior patterns and activity levels across time dimensions. Additionally, we conducted statistical analyses on top-selling item IDs, popular product categories, and user geographic distribution.
The resulting analytical tables were stored in Hive. Using Sqoop, these result tables were automatically exported to a relational MySQL database for efficient storage and analytical presentation.
For visualization, Python's PyEcharts library was employed to create front-end interactive displays. By querying datasets from MySQL, we generated multi-dimensional visualizations to enhance data interpretability. Finally, PyEcharts' `Page` method facilitated the design of an interactive dashboard, while static HTML deployment enabled a dynamic large-screen visualization interface. These visually rich presentations empower decision-makers to rapidly derive strategic insights.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.