Eliza Runtime Deployment Runbook
This runbook covers deploy, rollback, and on-call triage for the long-lived Eliza XMTP runtime (frontend/server/agent/eliza/index.ts).
Scope
- Service: Eliza runtime container (
frontend/Dockerfile.agent) - Platform: Railway primary
- Health endpoints:
- Liveness:
/healthz(boot is allowed) - Readiness:
/readyz(must be fully ready)
- Liveness:
Operational assumption for this repo:
- one Railway service
- one replica
- one primary XMTP consumer
- no standby or failover deployment by default
Architecture note:
- XMTP is the live Eliza transport on Railway.
- Telegram remains a separate webhook + Mini App runtime and should not be treated as a second Eliza deployment target.
- Shared cross-channel command/conversation logic lives in the agent core, but only the XMTP transport is hosted by this long-lived Railway service.
- Vercel serves the frontend and stateless API handlers only; it is not a production XMTP worker target in the default repo posture.
/api/agent/processmust not be scheduled on Vercel production or preview deployments.
Critical Environment Checklist
Before shipping, verify these values are configured:
XMTP_DB_DIRECTORYpoints to a persistent mounted path (Railway volume:/data/.xmtp-data)XMTP_DB_ENCRYPTION_KEYis set and stable across restarts- Runtime role is explicit:
- Railway deploy:
AGENT_RUNTIME_ROLE=primaryandAGENT_CONSUME_XMTP=true - Standby remains available only for local inspection
- Railway standby or
AGENT_CONSUME_XMTP=falseis treated as startup misconfiguration and fails fast
- Railway deploy:
- Primary production boots are Railway-only by default:
- set
ELIZA_ALLOW_OFF_RAILWAY_PRIMARY=trueonly for supervised off-Railway overrides
- set
- Off-Railway Grove registration uploads are disabled by default:
- set
ELIZA_ALLOW_OFF_RAILWAY_GROVE_UPLOAD=trueonly for supervised off-Railway overrides
- set
- DB-backed runtime lease lock:
AGENT_RUNTIME_LOCK_REQUIRED=trueAGENT_RUNTIME_LOCK_KEY=xmtp-primary-runtime-lockAGENT_RUNTIME_LOCK_HEARTBEAT_MS=10000AGENT_RUNTIME_LOCK_STALE_MS=30000- On Railway primary with Postgres configured, this is expected to be enabled and defaults on.
- Hard TEE gate for privileged signing/actions (if enabled):
TEE_ENFORCEMENT_ENABLED=trueERC8004_VALIDATION_REGISTRY+ERC8004_AGENT_IDsetTEE_VALIDATOR_ADDRESSESincludes trusted validators
- One runtime mode is configured:
- Multi-agent:
DATABASE_URL+XMTP_AGENT_KEY_ENCRYPTION_KEY - Single CSW:
XMTP_AGENT_CSW_ADDRESS+XMTP_AGENT_PRIVY_WALLET_ID - Single EOA (dev only):
XMTP_AGENT_PRIVATE_KEY
- Multi-agent:
- At least one LLM key for conversational fallback (
GROQ_API_KEY,OPENAI_API_KEY, etc)
Deploy Procedure (Railway)
- Confirm config and image source:
railway.tomlusesfrontend/Dockerfile.agent- persistent volume is mounted at
/data/.xmtp-data healthcheckPathis/readyz(strict readiness gate)- Railway env is
AGENT_RUNTIME_ROLE=primary - Railway env is
AGENT_CONSUME_XMTP=true - Runtime lock is enabled (
AGENT_RUNTIME_LOCK_REQUIRED=true, default-on when Postgres is present)
- Deploy (
railway upor UI deploy). - Watch startup logs until runtime mode and plugin/action counts print.
- Validate liveness and readiness:
GET /healthzshould return200GET /readyzshould return200andstatus: "ok"status: "standby"on Railway is a no-go and should be treated as misconfiguration
Vercel Guardrail
Keep the deployment split clean:
- Vercel may serve
frontend/api/*request/response handlers. - Railway is the only production-primary XMTP consumer.
- Do not add a Vercel cron for
/api/agent/process. - If
/api/agent/processstarts firing from a Vercel deployment, treat that as config drift. The usual symptom is repeated503noise because XMTP-primary env such asXMTP_AGENT_KEY_ENCRYPTION_KEYis intentionally not present there.
Go / No-Go Gates
Ship only if all pass:
/readyzis200with no blockingreadinessReasons/readyzreportsstatus: "ok";status: "standby"is a deploy failure on Railwaydependencies.xmtp.readyistruedependencies.queueWorker.runningistruein multi-agent modedependencies.queueWorker.stats.staleProcessingis0runtime.roleisprimaryruntime.consumeXmtpistrue- If TEE gate is enabled,
teeAttestation.passedistruefrom/api/v1/agents/identity/verification /keepr statussucceeds end-to-end in XMTP chat for a known configured vault
Rollback Procedure
- Roll back to previous Railway deployment.
- Keep the same XMTP DB volume and encryption key (do not rotate during rollback).
- Re-check
/healthzthen/readyz. - Re-run
/keepr statussmoke test.
Health Triage
Use /readyz payload first; map readinessReasons to action:
booting: wait for startup completion and initial sync.no_agents: verify selected startup mode and agent registration rows.env_validation_failed: fix required env vars; restart deploy.db_unavailable: check database connectivity and credentials.xmtp_not_running: inspect XMTP start logs and installation persistence.queue_stale_leases: verify worker health; stale leases are auto-reclaimed, but sustained growth indicates handler failures.
XMTP Installation Churn Recovery
Symptoms: repeated new installations, approaching 10/10 installation limit, or degraded reconnect behavior.
- Verify DB persistence:
- mounted volume exists and is writable
.db3files persist across restarts
- Verify
XMTP_DB_ENCRYPTION_KEYis unchanged. - Do not repeatedly restart while volume is broken.
- If inbox is near installation limit, perform controlled recovery and only then temporarily enable revoke mode (
XMTP_REVOKE_OTHER_INSTALLATIONS=true) for one supervised boot. - Disable revoke mode after recovery.
Post-Deploy Smoke
- Send
"/keepr status"in XMTP and confirm response returns. - Trigger a plain
/aiquestion and confirm non-empty response (or explicit budget/rate-limit message). - If Telegram bot flows matter for the release, verify them separately; Telegram is not served by this Railway XMTP runtime.
Optional Telemetry And Channels
- Telemetry:
- Set
ELIZA_TELEMETRY_ENABLED=trueto emit structured runtime/LLM/action events. - Optional webhook sink:
ELIZA_TELEMETRY_WEBHOOK_URL.
- Set
- Feature-flagged channel context plugins:
ELIZA_CHANNEL_TELEGRAM_ENABLED=trueELIZA_CHANNEL_DISCORD_ENABLED=trueELIZA_CHANNEL_TWITTER_ENABLED=true- Keep channel bot tokens server-side only.