Overview

ark-operator is a Kubernetes operator that manages AI agents as first-class cluster resources — with token budgets, semantic health checks, multi-agent pipelines, and pluggable LLM providers.

ark-operator

ark-operator extends Kubernetes with a new category of workload: AI agents. You declare what an agent knows (system prompt), what model it runs, and what tools it has access to. The operator handles the rest — scheduling, scaling, health validation, cost enforcement, and multi-agent coordination — using the same reconciliation model as any other Kubernetes controller.

This is not a wrapper around existing Kubernetes resources. It introduces primitives that have no equivalent in the core API:

Semantic health checks — probes that call the LLM and validate output quality, not just whether a port is open
Token budgets — hard per-run and rolling 24h cost limits enforced at the infrastructure level, before any API call is made
Multi-agent pipelines — declarative DAGs where agents pass typed data to each other, with conditional steps, loops, and parallel execution
Dynamic delegation — agents that decide at runtime which teammate to call, with cycle detection and permission scoping enforced by the operator
MCP tool connections — agents connect to external tool servers at startup; the operator manages credentials, auth headers, and reconnection
Pluggable LLM providers — swap Anthropic, OpenAI, Ollama, or a custom provider per agent or per pipeline step without changing operator code
GitOps-native prompt management — system prompts live in ConfigMaps or Secrets; a prompt change is a PR; rollback is a revert

Everything integrates with the Kubernetes ecosystem you already use. RBAC, namespaces, kubectl, ArgoCD, Flux — all work without modification.