Agentic AI for Autonomous Data Engineering: A Self-Healing Pipeline Framework on Google Cloud
Author(s): Selvakumar Kalyanasundaram
Publication #: 2605014
Date of Publication: 10.05.2026
Country: United States
Pages: 1-10
Published In: Volume 12 Issue 3 May-2026
DOI: https://doi.org/10.62970/IJIRCT.v12.i3.2605014
Abstract
Modern data engineering pipelines operating at scale on cloud platforms face persistent challenges including schema drift, data quality degradation, infrastructure failures, and evolving upstream dependencies. Traditional approaches to pipeline maintenance rely heavily on manual intervention, reactive monitoring, and static rule-based error handling, leading to significant operational overhead and prolonged downtime. This paper presents AgentFlow, a novel agentic AI framework for autonomous data engineering that leverages large language model (LLM)-powered agents to enable self-healing, self-optimizing data pipelines on Google Cloud Platform (GCP). AgentFlow introduces a multi-agent architecture comprising specialized agents for anomaly detection, root cause analysis, remediation planning, and execution, coordinated through a hierarchical orchestration layer built on Vertex AI. And formalize the self-healing pipeline problem as a closed-loop control system and present theoretical guarantees on convergence and stability. Experimental evaluation on production-scale workloads across BigQuery, Dataflow, Cloud Composer, and Pub/Sub demonstrates that AgentFlow reduces mean time to recovery (MTTR) by 73.2%, decreases manual interventions by 89.4%, and maintains data quality SLAs with 99.7% consistency. The given framework achieves autonomous resolution of 94.6% of common pipeline failures without human intervention, representing a paradigm shift from reactive to proactive data engineering operations.
Keywords: Agentic AI, Autonomous Data Engineering, Self-Healing Pipelines, Large Language Models, Google Cloud Platform, MLOps, DataOps, Multi-Agent Systems.
Download/View Count: 4
Share this Article