A Deep Reinforcement Learning Framework for End-to-End Retail Supply Chain Optimization
DOI:
https://doi.org/10.71465/fbf253Keywords:
Retail Supply Chain, Deep Reinforcement Learning, End-to-End Optimization, Inventory Management, Dynamic Decision-Making, Actor-Critic, Markov Decision Process, Intelligent LogisticsAbstract
Retail supply chains are increasingly challenged by volatility in consumer demand, supplier uncertainties, logistics constraints, and global disruptions. Traditional supply chain management approaches, often relying on deterministic planning or shallow learning-based heuristics, struggle to adapt dynamically to changing conditions. This paper proposes a novel end-to-end optimization framework leveraging Deep Reinforcement Learning (DRL) to improve supply chain decision-making across procurement, inventory, warehousing, and distribution.
Our proposed architecture models the entire retail supply chain as a Markov Decision Process (MDP), where each node (e.g., warehouse, store, supplier) acts as an agent interacting with a stochastic environment. The DRL framework employs a centralized actor-critic algorithm to learn optimal joint policies for multiple supply chain functions, aiming to minimize operational costs while maximizing service levels. The model is trained in a simulated environment constructed from historical retail transaction and logistics data.
Experimental results demonstrate that the DRL-based policy outperforms traditional rule-based and forecast-driven methods in terms of inventory turnover, fulfillment rate, and response to demand shocks. This study contributes to the literature by integrating dynamic learning and real-time adaptation into holistic supply chain operations, offering a promising approach to scalable, intelligent retail logistics.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.