Operations
Day-to-day operational concerns for running the LLM Service Daemon (LSD): auth, rate limiting, credentials, and cost tracking.
Off by default. Enable it and issue bearer tokens:
[gateway.auth]enabled = true
[gateway.auth.cache]enabled = true # default truettl_ms = 1000 # default 1000gateway --create-api-key # no expirationgateway --create-api-key --expiration 2027-01-01T00:00:00Zgateway --disable-api-key <public-id>--create-api-key prints the key once, to stdout. Requests authenticate with Authorization: Bearer <key>. The auth cache avoids hitting Postgres on every request; lower ttl_ms if you need faster key revocation propagation.
Rate limiting
Section titled “Rate limiting”Token-bucket rules (capacity + refill_rate per interval) scoped either to a tag on the request or to a specific API key:
[[rate_limiting.rules]]priority = 1scope = { tag_key = "team", tag_value = "research" }
[[rate_limiting.rules.limits]]resource = "token" # model_inference | token | costinterval = "minute" # second | minute | hour | day | week | monthcapacity = 100000refill_rate = 100000Multiple rules can apply; priority controls evaluation order. The cost resource limits in nano-dollars, normalized from each model’s configured cost rates.
Credentials
Section titled “Credentials”See Providers for the env::/path::/path_from_env::/dynamic::/sdk location syntax used by every provider’s api_key_location (and equivalent) field. There’s no separate secrets store: credentials are read from the environment or filesystem at startup, never persisted to Postgres.
Usage and cost tracking
Section titled “Usage and cost tracking”Attach a cost table to a provider to have LSD compute a per-inference cost from token usage, recorded alongside the inference row:
[models."gpt-5p4-mini".providers.openai]type = "openai"model_name = "gpt-5.4-mini"cost = [ { pointer = "/usage/prompt_tokens", cost_per_million = 0.15, required = true }, { pointer = "/usage/completion_tokens", cost_per_million = 0.60, required = true },]pointer is a JSON pointer into the provider’s raw usage response. Costs roll up into the model_provider_statistics and inference_by_function_statistics materialized views described in Observability.
OpenTelemetry and Prometheus
Section titled “OpenTelemetry and Prometheus”Covered in Observability.