Scaling
Manual scaling
Set spec.replicas directly on an ArkAgent:
spec:
replicas: 5
The operator reconciles the backing Deployment to match. Apply a patch without editing YAML:
kubectl patch arkagent research-agent -n my-org \
--type=merge -p '{"spec":{"replicas":10}}'
kubectl scale subresource
ArkAgent supports the Kubernetes scale subresource, so the standard kubectl scale command works:
# Scale up
kubectl scale arkagent research-agent -n my-org --replicas=5
# Drain without deleting (set to 0)
kubectl scale arkagent research-agent -n my-org --replicas=0
This integrates with any tool that uses the scale subresource — Horizontal Pod Autoscalers, GitOps controllers, etc.
Daily token budget: scale-to-zero
spec.limits.maxDailyTokens enforces a rolling 24-hour token cap. When the limit is hit, the operator automatically scales all agent replicas to 0. No manual intervention is required — replicas are restored automatically when the 24-hour window rotates and cumulative usage drops below the limit.
spec:
limits:
maxDailyTokens: 500000 # rolling 24h cap across all replicas
This means agents can disappear unexpectedly under normal operation if the budget is consumed. Set the limit to a value you are comfortable with being hit during a single day, or leave it unset (0) to disable it.
Two-layer enforcement
Budget enforcement happens at two points:
1. Agent-side (proactive check): Before each LLM call, the agent queries the task queue backend to sum tokens used in the last 24 hours. If the sum meets or exceeds AGENT_DAILY_TOKEN_LIMIT, the task is rejected immediately — no API call is made and the task is nacked back to the queue. This prevents runaway cost even between operator reconcile cycles.
2. Operator-side (backstop): The ArkAgent reconciler reads .status.dailyTokenUsage and scales replicas to 0 when the daily limit is reached. This is the backstop — it fires on the next reconcile after the agent-side check has already started rejecting tasks.
When debugging “why are my agents gone?”, check:
kubectl describe arkagent research-agent -n my-org
# Look for condition: BudgetExceeded
# or event: DailyLimitReached
kubectl get arkagent research-agent -n my-org \
-o jsonpath='{.status.dailyTokenUsage}'
Resource limits (per-agent)
The spec.limits block controls LLM-level resource consumption, not Kubernetes CPU/memory (set those on the backing pod template separately):
spec:
limits:
maxTokensPerCall: 8000
maxConcurrentTasks: 5
timeoutSeconds: 120
| Field | Type | Default | Description |
|---|---|---|---|
maxTokensPerCall | int | 8000 | Maximum tokens (input + output) per LLM API call. |
maxConcurrentTasks | int | 5 | Maximum tasks a single agent pod processes simultaneously. |
timeoutSeconds | int | 120 | Per-task timeout. Task is abandoned and an error returned after this duration. |
These are injected as environment variables (AGENT_MAX_TOKENS, AGENT_TIMEOUT_SECONDS, AGENT_MAX_CONCURRENT_TASKS) into agent pods and enforced by the agent runtime.
Queue-depth autoscaling (planned)
CPU and memory are poor proxies for agent load. What matters is task queue depth — how many tasks are waiting.
Queue-depth-based autoscaling via KEDA is planned. This will let you define scale-up/scale-down triggers on task queue depth so agent pod replicas grow automatically as work arrives and shrink when the queue drains.
See also
- ArkAgent — full spec walkthrough including
limits - Cost Management guide — daily token budget and per-run limits
- Scaling Agents guide — step-by-step scaling patterns