Operate and Scale

LyftData operations focus on keeping the control plane healthy, the job fleet productive, and telemetry flowing to the right places. Use this page as the jumping-off point for your runbooks.

Daily checklist

Confirm the server is reachable (for example GET /api/liveness) and that you can sign in.
Watch the live job status feed for stalled deploys, long retries, or sudden error spikes.
Track worker health in the UI; investigate offline workers and growing backlogs quickly.
Review errors and warnings in Logs & Issues and in your host logging system (systemd journal, Windows Event Log, or your central logging sink).

Runbooks by theme

Daily operations - Daily operations playbook keeps the control plane healthy with checklists and drills.
Observability & alerts - Monitoring LyftData covers metrics, dashboards, and alert wiring.
Logs and live events - Logs & Issues and Messages are your first stop for triage.
Resilience & recovery - Backup & recovery explains snapshot cadence, restores, and disaster recovery tests.
Account recovery - Reset an admin password covers the break-glass host-side flow when the only admin account is locked out.
Worker provisioning - Worker auto enrollment covers shared-secret bootstrap flows and what to disable afterwards.
Capacity planning - Scaling LyftData walks through worker sizing, channel fan-out strategies, and deployment hygiene.
Security posture - Security hardening documents TLS, secret rotation, and RBAC guidance.
Telemetry - Telemetry explains what LyftData collects locally and how to access it.

Releases and change management

Before upgrades, note your current version (lyftdata --version) and review the release notes.
Use the downloads portal for current builds and checksums.
Keep a simple change log for your environment (what changed, who approved it, and how to roll back).

Where to go next

Follow the Daily operations playbook for your everyday checklist and weekly reviews.
Set up dashboards using the Monitoring guide and plan capacity with the Scaling runbook.
Harden your deployment via Security guidance and Backup & recovery.
Track upcoming changes in the release notes and communicate upgrades with stakeholders.