An autonomous multi-agent pipeline that takes a GitHub issue and, without human intervention, analyzes requirements, writes code, verifies it locally, opens a pull request, and reviews it against the original specification.
Software development teams spend enormous amounts of time on repetitive, well-defined tasks: translating a GitHub issue into a requirement spec, writing boilerplate code, running the same test suite, and reviewing changes against the original ask.
These tasks follow predictable patterns but still consume hours of engineering time per issue. The question I wanted to answer: how much of this can a well-orchestrated AI agent swarm handle autonomously?
The answer, it turns out, is quite a lot, especially for things like creating GitHub Actions workflows, infrastructure-as-code changes, and text or config updates.
GitHub Issue
starting point
Reviewed PR
no human required
The system is event-driven with no central orchestrator. Each agent runs independently in Kubernetes, consuming from its own Azure Service Bus queue and publishing to the next. KEDA scales each agent to zero when idle — cost drops to near zero between issues.
GitHub Webhook
Issue opened, triggers FastAPI receiver
Issue Analyst
Classifies issue, extracts requirements & acceptance criteria, posts structured analysis as GitHub comment
Developer Agent
Plans changes, clones repo, writes code, runs build/lint/tests locally, fixes failures (up to 3 iterations), opens PR
Quality Agent
Waits for CI, reviews PR against original requirements, approves or requests changes — routes back to developer on failure
GitHub Issue Created
│
▼
┌─────────────────────┐
│ Webhook Receiver │ FastAPI · HMAC signature verification
│ (FastAPI) │
└──────────┬──────────┘
│ agent-analyst queue
▼
┌─────────────────────┐
│ Issue Analyst │ Claude multimodal · reads text + screenshots
│ Agent │ → posts requirements comment on GitHub issue
└──────────┬──────────┘
│ agent-developer queue
▼
┌─────────────────────┐
│ Developer Agent │ Claude · reads .agentconfig/ project standards
│ │ → clones repo → writes code → runs tests
│ ┌───────────────┐ │ → fixes failures (3x) → opens PR
│ │ 3-iter fix │ │
│ │ loop (local) │ │
│ └───────────────┘ │
└──────────┬──────────┘
│ agent-quality queue
▼
┌─────────────────────┐
│ Quality Agent │ Polls GitHub Actions CI (up to 30 min)
│ │ → on pass: Claude reviews PR vs requirements
│ ┌────────────────┐ │ → APPROVE or REQUEST_CHANGES
│ │ Retry tracking │ │ → on max retries: escalates to human
│ │ (Table Storage)│ │
│ └────────────────┘ │
└─────────────────────┘
Receives GitHub issue webhooks and uses Claude's multimodal capabilities to analyze both text and embedded screenshots. Classifies the issue type, extracts structured requirements with rationale, defines acceptance criteria with verification methods, and produces a concrete implementation plan. Posts the full analysis as a GitHub comment and applies labels.
The most complex agent. First evaluates feasibility — skipping ambiguous or
operational tasks. Then plans concrete file changes using the repository's
.agentconfig/ standards. Clones the target repo, applies changes,
runs the actual build and test suite, and iterates up to three times to fix
failures before opening a PR. Supports any language by reading runtime context
from environment variables.
Polls GitHub Actions until CI completes (up to 30 minutes), keeping the Service Bus message lock alive the entire time. On CI failure, extracts the raw log output and sends it back to the developer — no LLM interpretation, just exact errors. On CI pass, Claude reviews the PR against the original requirements. Tracks retry attempts in Azure Table Storage so state survives pod restarts. Escalates to human after three failed cycles.
Lightweight FastAPI service that acts as the system's front door. Validates GitHub
webhook signatures via HMAC-SHA256, maps events to typed payloads, and enqueues
them to Azure Service Bus. Deliberately ignores issue_comment and
pull_request_review events to prevent the agents' own GitHub comments
from re-triggering the pipeline.
Built entirely on Azure with security and cost-efficiency as first-class concerns. No long-lived credentials anywhere in the system — all authentication flows through managed identity and workload identity federation.
Compute
AKS + KEDA
Scales each agent 0→10 replicas based on queue depth. Idle cost: near zero.
Messaging
Azure Service Bus
5 named queues. Managed identity auth. Message lock renewal for long CI waits.
Secrets
Key Vault + Workload Identity
OIDC federation. No API keys in code or environment — mounted as volumes via CSI driver.
State
Azure Table Storage
Retry tracking survives pod restarts. Prevents infinite fix loops on hard failures.
Ingress
Traefik
Routes GitHub webhooks into the cluster. nip.io DNS for easy dev environments.
CI/CD
GitHub Actions
Lint, test, build, push to ACR on every merge. OIDC login — no stored credentials.
Every LLM call across all three agents is automatically traced to LangSmith via LangChain's native instrumentation. No manual callback wiring — enabling tracing requires only an API key. This gives full visibility into cost, token usage, and agent decision-making across the entire pipeline.
Cost Tracking
LangSmith calculates per-invocation cost automatically based on model and token counts. Every issue processed produces a full cost breakdown across all three agents — analyst, developer, and quality — so the real cost-per-issue is always visible, not estimated.
Token Usage per Agent
Input and output token counts are captured for every LLM call, attributed to the agent and phase that made it (evaluate, plan, verify, implement, review). This makes it straightforward to spot which agent phases are expensive and tune prompts accordingly.
Decision Tracing
Full input/output traces for every agent invocation are stored in LangSmith under
the devops-agent-swarm project. When an agent makes a surprising
decision — skipping an issue, requesting changes unexpectedly — the exact prompt
and response are one click away.
Zero Instrumentation Overhead
The shared sdlc-swarm-common library calls configure_langsmith()
on every agent startup. If the API key is present, tracing is on. If not, agents
run normally. No code changes required to enable or disable observability.
No central orchestrator
Agents are fully decoupled via Service Bus queues. Each can scale, fail, and restart independently without affecting the others.
Local verification before PR creation
The developer agent actually runs the build and test suite on the generated code before opening a PR. Broken code never hits GitHub review.
Raw CI logs to developer, not LLM summaries
When CI fails, the quality agent sends the verbatim error output back to the developer agent. The developer needs exact errors, not a Claude interpretation of them.
Project standards via .agentconfig/
Each target repo defines its own coding standards, build commands, branch naming, and review criteria in a .agentconfig/ directory. Agents adapt to the project — not the other way around.
Webhook loop prevention
The receiver explicitly ignores issue_comment and pull_request_review events — preventing the agents' own GitHub comments from retriggering the pipeline.