Cross-System Transfer Learning for Root Cause Analysis via Domain-Invariant Graph Representations

Authors

  • Zihan Peng Department of Computer Science, University of Rochester, USA Author
  • Junyue Ma Department of Computer Science, University of Rochester, USA Author
  • Pieter Van den Broeck Department of Computer Science, KU Leuven, Belgium Author

DOI:

https://doi.org/10.71465/fapm532

Keywords:

transfer learning, root cause analysis, graph neural networks, domain adaptation, fault localization, distributed systems

Abstract

Root cause analysis (RCA) in complex distributed systems faces significant challenges when diagnostic models need to be transferred across heterogeneous infrastructure environments. Traditional machine learning approaches for fault localization suffer from substantial performance degradation when applied to systems with different architectures, monitoring configurations, or operational characteristics. This paper introduces a novel cross-system transfer learning framework that leverages domain-invariant graph representations to enable effective knowledge transfer for RCA tasks. The proposed methodology constructs system behaviors as attributed graphs where nodes represent components and edges capture causal dependencies, then employs message passing neural networks to learn structural embeddings through adversarial feature alignment and graph contrastive learning. By disentangling system-agnostic causal patterns from domain-specific characteristics through domain-adversarial training with gradient reversal mechanisms, the framework maintains diagnostic accuracy when deploying models from well-instrumented source systems to target systems with limited training data. Experimental evaluations on production cloud infrastructure demonstrate that the approach achieves superior generalization performance compared to conventional transfer learning baselines, reducing diagnostic errors by 31% on average across heterogeneous system environments while maintaining computational efficiency suitable for real-time fault diagnosis.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-25