FAQ
What is ARKONIS?
ARKONIS (Agentic Reconciler for Kubernetes, Operator-Native Inference System) is a Kubernetes operator that lets you deploy, scale, and manage AI agents as first-class Kubernetes resources. Instead of managing containers, you manage agents — each defined by a model, a system prompt, and a set of tools.
Do I need a Kubernetes cluster to try it?
No. The ark CLI lets you run ArkTeam pipelines locally on your laptop with no cluster, no Redis, and no operator required. The same YAML files work unchanged when you deploy to Kubernetes.
go install github.com/arkonis-dev/ark-operator/cmd/ark@latest
ark run quickstart.yaml --provider mock --watch
Which LLM providers are supported?
Any OpenAI-compatible endpoint works — including Ollama, vLLM, LM Studio, and OpenAI itself. Set OPENAI_BASE_URL to point at your local server and use the model name as-is (e.g. llama3.2, qwen2.5:7b). Anthropic (claude-* models) is also supported natively. The provider is auto-detected from the model name; unknown model names default to the OpenAI provider. Google Gemini is planned for v0.11.
How is this different from LangChain / LlamaIndex / CrewAI?
Those are Python frameworks for building agent logic inside your application code. ARKONIS is infrastructure — it runs outside your code and manages agents the same way Kubernetes manages containers. You don’t import it; you kubectl apply it.
How is this different from just running agents in a Deployment?
A plain Deployment knows nothing about agents. It can’t detect when an agent is producing bad output, enforce token budgets, chain agents into DAG pipelines, or route tasks by load. ARKONIS introduces purpose-built primitives (ArkAgent, ArkTeam, ArkService) that encode those concepts at the infrastructure level.
Does it work with any Kubernetes distribution?
Yes. It uses standard controller-runtime and has no cloud-provider dependencies. It works on EKS, GKE, AKS, kind, k3s, and any other conformant distribution.
What task queue does it use?
Redis Streams with consumer groups. Redis is the only external dependency. A future release will add a Kubernetes-native queue option for teams that prefer not to run Redis.
Is there a Helm chart?
Yes. Add the Helm repo and install with one command:
helm repo add arkonis https://charts.arkonis.dev
helm install ark-operator arkonis/ark-operator \
--namespace ark-system --create-namespace \
--set taskQueueURL=redis.ark-system.svc.cluster.local:6379 \
--set agentExtraEnv[0].name=AGENT_PROVIDER,agentExtraEnv[0].value=openai \
--set agentExtraEnv[1].name=OPENAI_BASE_URL,agentExtraEnv[1].value=http://ollama.ollama.svc.cluster.local:11434/v1 \
--set agentExtraEnv[2].name=OPENAI_API_KEY,agentExtraEnv[2].value=ollama
See the Getting Started guide for the full install options including hosted providers.
Is it production-ready?
ark-operator is in alpha. Core primitives (v0.1–v0.9) are stable and running in local clusters. OpenTelemetry tracing and metrics are fully implemented (v0.8). The main gaps before production at scale are human-in-the-loop checkpoints (, planned) and multi-tenancy hardening (v1.0). See the roadmap on GitHub for the full picture.
What happens to in-flight tasks when I roll out a new systemPrompt?
Tasks already submitted to the queue before the rollout complete with the old prompt. Tasks submitted after the new pods are running use the new prompt. There is no mid-task prompt change — each task is a single LLM call loop that runs to completion with the config that was injected at pod startup.
To ensure a clean cutover, scale the agent to zero first (draining the queue), update the systemPrompt, then scale back up. Or use systemPromptRef to point at a ConfigMap key — rolling out the ConfigMap alone is sufficient, no pod restart needed for the next task.
Can I run multiple ArkTeams in parallel?
Yes. Each ArkTeam gets its own queue namespace (<namespace>.<team>.<role>) so teams are fully isolated. There is no built-in limit on how many teams run concurrently — the practical limit is your Redis connection count and LLM provider rate limits.
What is the operator’s own CPU and memory footprint?
The operator pod defaults to 100m CPU request / 500m limit and 128Mi memory request / 256Mi limit. In practice it runs well under 50m CPU and 80Mi memory at moderate load (10–20 active agent pods). These are configurable via resources.* in Helm values.
Does it work in air-gapped environments?
Yes, with two requirements:
- Images: push
ghcr.io/arkonis-dev/ark-operatorandghcr.io/arkonis-dev/ark-runtimeto your internal registry and setoperatorImage/agentImagein Helm values. - LLM provider: use Ollama, vLLM, or any OpenAI-compatible endpoint running inside your cluster. Set
OPENAI_BASE_URLto the in-cluster endpoint. No outbound internet required.
The operator itself makes no outbound calls. All LLM traffic goes through agent pods to the configured provider endpoint.
What happens if the operator pod restarts mid-run?
The operator is stateless — all run state lives in ArkRun objects in etcd. On restart, the operator re-lists all ArkRun resources and resumes reconciliation from where it left off. Steps that completed before the restart are not re-executed. Steps that were Running at restart time may be resubmitted to the queue (at-least-once delivery).
Leader election is not enabled by default in the current release — run a single operator replica unless you are comfortable with the at-least-once resubmission behavior. Leader election support is planned.
How do I contribute?
Open an issue or pull request on GitHub. The project is Apache 2.0 licensed.