The industry's first open-source, multi-agent framework for DevOps automation.
Specialized AI agents that plan, provision, deploy, and monitor — on your command.
Modern infrastructure is brutally complex. Engineers juggle Terraform states, dense Kubernetes manifests, CI/CD pipelines, multi-cloud networking, and incident response — often simultaneously. The result?
- Knowledge silos — Junior engineers need years before they can safely touch production.
- Expert bottlenecks — Senior architects fight fires instead of doing strategic work.
- Documentation rot — Static runbooks can't adapt to the weird edge-case your system is experiencing right now.
TalkOps changes this. We turn your DevOps expertise into autonomous, specialized AI agents. You describe what you need in plain English — the agents negotiate, plan, and execute the work for you.
TalkOps is not a chatbot hooked up to a bash terminal. It's a structured, enterprise-grade orchestration framework powered by LangGraph.
┌─────────────────────────────────────────────────────────────────────┐
│ YOU (Natural Language) │
│ "Deploy the checkout service to AWS │
│ and set up Prometheus monitoring" │
└───────────────────────────────┬─────────────────────────────────────┘
│
▼
┌────────────────────────┐
│ Supervisor Agent │
│ (Intent → Task Graph) │
└─────┬──────────┬───────┘
│ │
┌──────────▼──┐ ┌───▼──────────┐
│ Infra Agent │ │ Monitoring │
│ (Terraform) │ │ Agent │
└──────┬───────┘ └──────┬───────┘
│ │
┌──────▼─────────────────▼──────┐
│ Human Approval Gate │
│ (GitOps · Dry-run · RBAC) │
└──────────────┬────────────────┘
│
▼
┌─────────────────┐
│ Apply & Report │
└─────────────────┘
The Supervisor Agent interprets your intent, decomposes it into a DAG of tasks, routes each task to the right specialist, and aggregates results. Nothing touches production without a Human Approval Checkpoint. Everything is GitOps-native — agents generate the changes, commit to Git, open a PR, and wait.
TalkOps ships with domain-expert agent swarms — each obsessively focused on a single operational domain:
| Agent | What It Does |
|---|---|
| ☁️ Infrastructure Agent | Provisions cloud resources across AWS, Azure, and GCP. Generates production-grade Terraform with built-in security compliance, version pinning, and automated validation loops. |
| 🚀 Application Agent | Manages CI/CD pipelines, orchestrates rolling / blue-green / canary deployments, and automates container lifecycle on Kubernetes. |
| 📊 Monitoring Agent | Configures Prometheus metrics, builds Grafana dashboards, and sets up alerting — all from a single sentence. |
| 🛡️ SRE Agent | 24/7 on-call responder. Tracks SLOs/SLIs, monitors cluster health, and executes safe auto-remediation before you get paged. |
Our flagship agent uses a Deep Agent architecture with a multi-stage pipeline:
User Request → Supervisor → TF Planner (3 sub-agents) → TF Generator → TF Validator → GitHub Agent
↑ │
└──── retry ───┘
It generates complete, production-ready Terraform modules (main.tf, variables.tf, outputs.tf, versions.tf, README.md) — validated in a sandbox with terraform validate before delivery. Security best practices (least-privilege IAM, SSE with KMS, VPC flow logs) are enforced at the prompt level, not bolted on.
AI agents are only as useful as the tools they can safely access. We use the Model Context Protocol (MCP) — an open standard — to connect agents to your infrastructure through secure, task-scoped tool interfaces.
| MCP Server | Purpose |
|---|---|
| 🎡 Helm MCP | Install, upgrade, and rollback Helm charts on live clusters |
| ☁️ Terraform MCP | Generate plans, detect drift, and apply state across AWS/Azure/GCP |
| 🚀 ArgoCD MCP | Create projects, sync applications, and monitor GitOps delivery |
| 🔀 Traefik MCP | Manage edge routing, canary weights, middleware, and NGINX migrations |
| 🔄 Argo Rollouts MCP | Orchestrate progressive delivery — canary, blue-green, analysis, and promotion |
Security model: Agents never hold long-lived credentials. MCP servers issue ephemeral, task-scoped tokens that expire within 30 minutes and are restricted to the exact resources authorized for the active task.
🔧 Build your own: MCP is an open standard. Write a custom server for your internal tools and TalkOps agents will discover and use them automatically.
TalkOps is built on a three-layer protocol stack:
| Layer | Protocol | Role |
|---|---|---|
| Agent ↔ Agent | A2A (JSON-RPC 2.0) | Deterministic, validated inter-agent communication |
| Orchestration | LangGraph | Stateful DAG execution with parallel task support |
| Agent ↔ User | A2UI | Progressive streaming of rich UI components (buttons, forms, charts) |
Trust is non-negotiable when AI touches infrastructure:
- Multi-layered guardrails — Technical limits, policy enforcement, behavioral constraints, and LLM content safety
- Confidence-based routing — Low-risk ops auto-approve; high-risk ops halt for human review; destructive ops require multi-admin sign-off
- Immutable audit trails — Every action produces a cryptographically traceable log (who requested, who approved, which policies evaluated) for SOC 2 / HIPAA / ISO 27001 compliance
# Clone the orchestrator
git clone https://github.com/talkops-ai/aws-orchestrator-agent.git
cd aws-orchestrator-agent
# Configure environment
cp .env.example .env
# Set GOOGLE_API_KEY, GITHUB_PERSONAL_ACCESS_TOKEN, TERRAFORM_WORKSPACE
# Launch
docker compose up -d
# Open TalkOps UI
open http://localhost:8080Then just type what you need:
"Create an S3 bucket with versioning and customer-managed KMS encryption"
The agent will plan → generate → validate → deliver production-ready Terraform, and wait for your approval before pushing to Git.
| Repository | Description |
|---|---|
talkops-agents-docs |
📖 Documentation site (you're reading content from here) |
aws-orchestrator-agent |
☁️ AWS Infrastructure Orchestrator with Deep Agent pipeline |
helm-mcp-server |
🎡 MCP server for Kubernetes & Helm operations |
terraform-mcp-server |
☁️ MCP server for Terraform plan/apply/validate |
argocd-mcp-server |
🚀 MCP server for ArgoCD GitOps management |
traefik-mcp-server |
🔀 MCP server for Traefik edge routing & traffic management |
argo-rollout-mcp-server |
🔄 MCP server for Argo Rollouts progressive delivery |
We're building this in public and we want you to be a part of it. Whether it's fixing a typo, adding a new MCP server, or proposing an entirely new agent swarm — contributions are welcome.
- Fork the relevant repository
- Create a feature branch (
git checkout -b feat/my-feature) - Commit your changes (
git commit -m 'feat: add my feature') - Push and open a Pull Request
Building something with TalkOps? Need help integrating AI agents into your DevOps workflow? We'd love to hear from you.
🌐 talkops.ai/services · 💻 GitHub · 💼 LinkedIn
Built with ❤️ by the TalkOps team · Open source under the MIT License
