← All Projects
Active Development Python Claude AI Azure Kubernetes LangChain Service Bus KEDA FastAPI LangSmith

SDLC Agent Swarm

An autonomous multi-agent pipeline that takes a GitHub issue and, without human intervention, analyzes requirements, writes code, verifies it locally, opens a pull request, and reviews it against the original specification.

The Problem

Software development teams spend enormous amounts of time on repetitive, well-defined tasks: translating a GitHub issue into a requirement spec, writing boilerplate code, running the same test suite, and reviewing changes against the original ask.

These tasks follow predictable patterns but still consume hours of engineering time per issue. The question I wanted to answer: how much of this can a well-orchestrated AI agent swarm handle autonomously?

The answer, it turns out, is quite a lot, especially for things like creating GitHub Actions workflows, infrastructure-as-code changes, and text or config updates.

GitHub Issue

starting point

Reviewed PR

no human required

Architecture

The system is event-driven with no central orchestrator. Each agent runs independently in Kubernetes, consuming from its own Azure Service Bus queue and publishing to the next. KEDA scales each agent to zero when idle — cost drops to near zero between issues.

🔗

GitHub Webhook

Issue opened, triggers FastAPI receiver

🤖

Issue Analyst

Classifies issue, extracts requirements & acceptance criteria, posts structured analysis as GitHub comment

💻

Developer Agent

Plans changes, clones repo, writes code, runs build/lint/tests locally, fixes failures (up to 3 iterations), opens PR

Quality Agent

Waits for CI, reviews PR against original requirements, approves or requests changes — routes back to developer on failure

GitHub Issue Created
        │
        ▼
┌─────────────────────┐
│  Webhook Receiver   │  FastAPI · HMAC signature verification
│     (FastAPI)       │
└──────────┬──────────┘
           │  agent-analyst queue
           ▼
┌─────────────────────┐
│   Issue Analyst     │  Claude multimodal · reads text + screenshots
│      Agent          │  → posts requirements comment on GitHub issue
└──────────┬──────────┘
           │  agent-developer queue
           ▼
┌─────────────────────┐
│   Developer Agent   │  Claude · reads .agentconfig/ project standards
│                     │  → clones repo → writes code → runs tests
│  ┌───────────────┐  │  → fixes failures (3x) → opens PR
│  │ 3-iter fix    │  │
│  │ loop (local)  │  │
│  └───────────────┘  │
└──────────┬──────────┘
           │  agent-quality queue
           ▼
┌─────────────────────┐
│    Quality Agent    │  Polls GitHub Actions CI (up to 30 min)
│                     │  → on pass: Claude reviews PR vs requirements
│  ┌────────────────┐ │  → APPROVE or REQUEST_CHANGES
│  │ Retry tracking │ │  → on max retries: escalates to human
│  │ (Table Storage)│ │
│  └────────────────┘ │
└─────────────────────┘

The Agents

01

Issue Analyst

Receives GitHub issue webhooks and uses Claude's multimodal capabilities to analyze both text and embedded screenshots. Classifies the issue type, extracts structured requirements with rationale, defines acceptance criteria with verification methods, and produces a concrete implementation plan. Posts the full analysis as a GitHub comment and applies labels.

Claude Multimodal PyGithub LangChain
02

Developer Agent

The most complex agent. First evaluates feasibility — skipping ambiguous or operational tasks. Then plans concrete file changes using the repository's .agentconfig/ standards. Clones the target repo, applies changes, runs the actual build and test suite, and iterates up to three times to fix failures before opening a PR. Supports any language by reading runtime context from environment variables.

Claude Git Local Verification Multi-Runtime
03

Quality Agent

Polls GitHub Actions until CI completes (up to 30 minutes), keeping the Service Bus message lock alive the entire time. On CI failure, extracts the raw log output and sends it back to the developer — no LLM interpretation, just exact errors. On CI pass, Claude reviews the PR against the original requirements. Tracks retry attempts in Azure Table Storage so state survives pod restarts. Escalates to human after three failed cycles.

Claude CI Polling Azure Table Storage Retry Management
04

Webhook Receiver

Lightweight FastAPI service that acts as the system's front door. Validates GitHub webhook signatures via HMAC-SHA256, maps events to typed payloads, and enqueues them to Azure Service Bus. Deliberately ignores issue_comment and pull_request_review events to prevent the agents' own GitHub comments from re-triggering the pipeline.

FastAPI HMAC Verification Azure Service Bus

Infrastructure

Built entirely on Azure with security and cost-efficiency as first-class concerns. No long-lived credentials anywhere in the system — all authentication flows through managed identity and workload identity federation.

Compute

AKS + KEDA

Scales each agent 0→10 replicas based on queue depth. Idle cost: near zero.

Messaging

Azure Service Bus

5 named queues. Managed identity auth. Message lock renewal for long CI waits.

Secrets

Key Vault + Workload Identity

OIDC federation. No API keys in code or environment — mounted as volumes via CSI driver.

State

Azure Table Storage

Retry tracking survives pod restarts. Prevents infinite fix loops on hard failures.

Ingress

Traefik

Routes GitHub webhooks into the cluster. nip.io DNS for easy dev environments.

CI/CD

GitHub Actions

Lint, test, build, push to ACR on every merge. OIDC login — no stored credentials.

Observability

Every LLM call across all three agents is automatically traced to LangSmith via LangChain's native instrumentation. No manual callback wiring — enabling tracing requires only an API key. This gives full visibility into cost, token usage, and agent decision-making across the entire pipeline.

📈

Cost Tracking

LangSmith calculates per-invocation cost automatically based on model and token counts. Every issue processed produces a full cost breakdown across all three agents — analyst, developer, and quality — so the real cost-per-issue is always visible, not estimated.

🏷

Token Usage per Agent

Input and output token counts are captured for every LLM call, attributed to the agent and phase that made it (evaluate, plan, verify, implement, review). This makes it straightforward to spot which agent phases are expensive and tune prompts accordingly.

🔍

Decision Tracing

Full input/output traces for every agent invocation are stored in LangSmith under the devops-agent-swarm project. When an agent makes a surprising decision — skipping an issue, requesting changes unexpectedly — the exact prompt and response are one click away.

Zero Instrumentation Overhead

The shared sdlc-swarm-common library calls configure_langsmith() on every agent startup. If the API key is present, tracing is on. If not, agents run normally. No code changes required to enable or disable observability.

Key Design Decisions

No central orchestrator

Agents are fully decoupled via Service Bus queues. Each can scale, fail, and restart independently without affecting the others.

Local verification before PR creation

The developer agent actually runs the build and test suite on the generated code before opening a PR. Broken code never hits GitHub review.

Raw CI logs to developer, not LLM summaries

When CI fails, the quality agent sends the verbatim error output back to the developer agent. The developer needs exact errors, not a Claude interpretation of them.

Project standards via .agentconfig/

Each target repo defines its own coding standards, build commands, branch naming, and review criteria in a .agentconfig/ directory. Agents adapt to the project — not the other way around.

Webhook loop prevention

The receiver explicitly ignores issue_comment and pull_request_review events — preventing the agents' own GitHub comments from retriggering the pipeline.

What Works Well Today

  • GitHub Actions workflow generation and updates
  • Infrastructure-as-code changes (Terraform, Kubernetes YAML)
  • Text, copy, and configuration updates
  • Small, well-scoped feature additions
  • Dependency and package version bumps

What's Next

  • Screenshots and architecture diagrams on this page
  • Expanded language and framework support
  • Agent observability dashboard
  • Broader test coverage on multi-file changes