ArkAgent

ArkAgent manages a pool of LLM agent instances with configurable models, prompts, MCP servers, token limits, and semantic health checks.

API: arkonis.dev/v1alpha1 Kind: ArkAgent Short name: arkagent Scope: Namespaced

ArkAgent manages a pool of LLM agent instances. You declare what an agent knows, what it can do, and what it’s allowed to spend — the operator keeps that state running, healthy, and within budget.


What an agent is

An agent is a long-running process that:

  1. Reads configuration from environment variables injected by the operator
  2. Connects to configured MCP tool servers at startup
  3. Polls the task queue for work
  4. Calls the configured LLM provider with the task and available tools
  5. Runs the tool-use loop until the model stops invoking tools
  6. Returns the result to the queue

The agent binary (ark-runtime) has no Kubernetes dependencies. The same binary runs in-cluster and locally via ark run.


Example

apiVersion: arkonis.dev/v1alpha1
kind: ArkAgent
metadata:
  name: research-agent
  namespace: default
spec:
  replicas: 2
  model: llama3.2
  systemPromptRef:
    configMapKeyRef:
      name: research-prompt
      key: system.txt
  mcpServers:
    - name: web-search
      url: https://search.mcp.internal/sse
      headers:
        Authorization:
          secretKeyRef:
            name: mcp-credentials
            key: token
  tools:
    - name: fetch_news
      description: "Fetch the latest news for a topic."
      url: http://news-api.internal/headlines
      method: POST
      inputSchema: '{"type":"object","properties":{"topic":{"type":"string"}},"required":["topic"]}'
  limits:
    maxTokensPerCall: 8000
    maxConcurrentTasks: 5
    timeoutSeconds: 120
    maxDailyTokens: 500000
  livenessProbe:
    type: semantic
    intervalSeconds: 30
    validatorPrompt: "Reply with exactly one word: HEALTHY"
  configRef:
    name: analyst-base
  memoryRef:
    name: research-memory
  notifyRef:
    name: on-degraded-slack

System prompt: inline vs. reference

Inline works for development and short prompts:

spec:
  systemPrompt: "You are a research assistant. Be thorough and cite sources."

Reference is required for production or prompts over 50 KB. The operator watches the ConfigMap or Secret and triggers a rolling restart when the content changes:

spec:
  systemPromptRef:
    configMapKeyRef:
      name: research-prompt
      key: system.txt

systemPromptRef takes precedence when both are set.


MCP servers

MCP servers extend the agent with tools at runtime. The agent connects via SSE at startup, discovers available tools, and exposes them to the LLM. Connection failures are non-fatal — the agent starts with a reduced toolset and logs the error.

Tool names are namespaced as {server-name}__{tool-name} to avoid collisions between servers.

Auth credentials stay in Secrets — the operator resolves them before injecting into pods:

mcpServers:
  - name: search
    url: https://search.mcp.internal/sse
    headers:
      Authorization:
        secretKeyRef:
          name: mcp-creds
          key: token

Inline webhook tools

For simple HTTP integrations, skip the MCP server entirely and define tools inline. The agent calls the URL when the LLM invokes the tool:

tools:
  - name: get_weather
    description: "Get current weather for a city."
    url: http://weather-api.internal/current
    method: GET
    inputSchema: '{"type":"object","properties":{"city":{"type":"string"}}}'

Built-in tools

Available in every agent pod regardless of MCP or webhook tool configuration:

ToolDescription
submit_subtaskEnqueue a new agent task asynchronously. Returns the task ID. Enables supervisor/worker patterns without a full ArkTeam.
delegateInjected in ArkTeam context only. Routes a task to a specific team role and blocks until the role returns a result.

Daily token budget

limits.maxDailyTokens enforces a rolling 24-hour cap. Enforcement is two-layered:

  1. Agent-side — the runtime checks accumulated token usage before each LLM call and rejects tasks immediately when the budget is exhausted.
  2. Operator-side — the reconciler scales replicas to 0 as a backstop on the next reconcile cycle.

Replicas are automatically restored when the 24-hour window rotates.


Semantic health checks

Standard Kubernetes probes cannot tell whether the LLM is producing useful output. Setting livenessProbe.type: semantic enables a /readyz endpoint that calls the LLM with a validation prompt on each probe. When /readyz returns 503, ArkService stops routing tasks to that pod.


Spec reference

spec

FieldTypeRequiredDefaultDescription
replicasint32no1Number of agent pod replicas. Range: 0–50.
modelstringyesLLM model ID. Drives provider auto-detection.
systemPromptstringone ofInline system prompt text.
systemPromptRefSystemPromptSourceone ofReference to a ConfigMap or Secret key. Takes precedence over systemPrompt.
mcpServers[]MCPServerSpecnoMCP tool servers connected at pod startup.
tools[]WebhookToolSpecnoInline HTTP webhook tools.
limitsArkonisLimitsnoPer-agent resource and token limits.
livenessProbeArkonisProbenoSemantic health check configuration.
configRefLocalObjectReferencenoName of an ArkSettings in the same namespace.
memoryRefLocalObjectReferencenoName of an ArkMemory in the same namespace.
notifyRefLocalObjectReferencenoName of an ArkNotify policy for AgentDegraded events.

spec.systemPromptRef

Exactly one sub-field must be set.

FieldDescription
configMapKeyRef.nameConfigMap name in the same namespace.
configMapKeyRef.keyKey in the ConfigMap data.
secretKeyRef.nameSecret name in the same namespace.
secretKeyRef.keyKey in the Secret data.

spec.mcpServers[]

FieldTypeRequiredDescription
namestringyesLogical name. Tool names are prefixed: name__toolname.
urlstringyesSSE endpoint URL.
headersmap[string]MCPHeaderValuenoHTTP headers sent with every request.

mcpServers[].headers values

FieldDescription
valueLiteral header value.
secretKeyRef.nameSecret name.
secretKeyRef.keyKey in the Secret data.

spec.tools[]

FieldTypeRequiredDefaultDescription
namestringyesTool identifier exposed to the LLM.
descriptionstringnoExplains the tool’s purpose to the LLM.
urlstringyesHTTP endpoint the agent calls.
methodstringnoPOSTHTTP method: GET, POST, PUT, PATCH.
inputSchemastringnoJSON Schema (raw JSON string) for tool parameters.

spec.limits

FieldTypeDefaultDescription
maxTokensPerCallint8000Token budget (input + output) per LLM call.
maxConcurrentTasksint5Max tasks a single pod processes simultaneously.
timeoutSecondsint120Per-task deadline in seconds.
maxDailyTokensint640 (no limit)Rolling 24-hour token cap. Scales replicas to 0 when reached; auto-resumes when the window rotates.

spec.livenessProbe

FieldTypeDefaultDescription
typestringping — HTTP reachability only. semantic — enables LLM output validation via /readyz.
intervalSecondsint30Probe interval in seconds.
validatorPromptstring(built-in)Prompt sent during semantic validation.

status

FieldTypeDescription
replicasint32Total pods managed by this agent.
readyReplicasint32Pods passing liveness and readiness checks.
dailyTokenUsageTokenUsageRolling 24-hour token usage (when limits.maxDailyTokens is set).
observedGenerationint64The .metadata.generation this status reflects.
conditions[]ConditionAvailable, Progressing, Degraded, BudgetExceeded.

Provider auto-detection

Model prefixProvider
claude-*Anthropic (ANTHROPIC_API_KEY)
gpt-*, o1-*, o3-*OpenAI (OPENAI_API_KEY)
anything elseOpenAI-compatible (OPENAI_API_KEY + OPENAI_BASE_URL)

Override with AGENT_PROVIDER env var.


See also