Operations

Day-to-day operational concerns for running the LLM Service Daemon (LSD): auth, rate limiting, credentials, and cost tracking.

Auth

Off by default. Enable it and issue bearer tokens:

[gateway.auth]
enabled = true

[gateway.auth.cache]
enabled = true   # default true
ttl_ms = 1000    # default 1000

gateway --create-api-key                          # no expiration
gateway --create-api-key --expiration 2027-01-01T00:00:00Z
gateway --disable-api-key <public-id>

--create-api-key prints the key once, to stdout. Requests authenticate with Authorization: Bearer <key>. The auth cache avoids hitting Postgres on every request; lower ttl_ms if you need faster key revocation propagation.

Rate limiting

Token-bucket rules (capacity + refill_rate per interval) scoped either to a tag on the request or to a specific API key:

[[rate_limiting.rules]]
priority = 1
scope = { tag_key = "team", tag_value = "research" }

[[rate_limiting.rules.limits]]
resource = "token"           # model_inference | token | cost
interval = "minute"          # second | minute | hour | day | week | month
capacity = 100000
refill_rate = 100000

Multiple rules can apply; priority controls evaluation order. The cost resource limits in nano-dollars, normalized from each model’s configured cost rates.

Credentials

See Providers for the env::/path::/path_from_env::/dynamic::/sdk location syntax used by every provider’s api_key_location (and equivalent) field. There’s no separate secrets store: credentials are read from the environment or filesystem at startup, never persisted to Postgres.

Usage and cost tracking

Attach a cost table to a provider to have LSD compute a per-inference cost from token usage, recorded alongside the inference row:

[models."gpt-5p4-mini".providers.openai]
type = "openai"
model_name = "gpt-5.4-mini"
cost = [
  { pointer = "/usage/prompt_tokens", cost_per_million = 0.15, required = true },
  { pointer = "/usage/completion_tokens", cost_per_million = 0.60, required = true },
]

pointer is a JSON pointer into the provider’s raw usage response. Costs roll up into the model_provider_statistics and inference_by_function_statistics materialized views described in Observability.

OpenTelemetry and Prometheus

Covered in Observability.