Skip to content

Select theme

Overview
Get Started
Install & Configure
Jobs
- Overview
- Day 1 Production Pipeline
- Tutorials
- CI/CD Automation
- Visual Editor
- Inputs
- Outputs
- Actions
- Variable Expansion
- Dealing with Time
- Advanced Scheduling
- Deploying Jobs
- Transformations
  - Transforming Data
  - GA4 Transformation
- Enriching Data
Deployments
- Overview
- Catalog
- Core Concepts
- Workflows
- Triggers
- Deployment Manager
- Tutorials
  - File Store Roundtrip
Integrations
AI
- Overview
- MCP
Operate & Scale
Reference

Select theme

On this page

Overview
Daily checklist (15 minutes)
Weekly tasks
Before deploying changes
Incident drills & readiness
Resources

On this page

Overview
Daily checklist (15 minutes)
Weekly tasks
Before deploying changes
Incident drills & readiness
Resources

Daily Operations

This playbook is aimed at operators and SREs who need a repeatable cadence for keeping LyftData healthy. Use it alongside the detailed runbooks in Operate & Scale.

Daily checklist (15 minutes)

Review the dashboard for worker status, queue depth, and recent alerts.
Scan the job status feed for stalled deployments or retry storms.
Check licensing state in the UI or via lyftdata license show so Community Edition limits or expiring keys are flagged early.
Spot-check server and worker logs for new error signatures (journalctl -u lyftdata-server, worker logs).

Tip

Automate these checks where possible—scripts that call GET /api/liveness and GET /api/health (with an admin bearer token, plus worker/job queries) reduce manual toil. Prefer https://… by default; add -k only for evaluation/self-signed environments.

Weekly tasks

Review worker utilization trends in the monitoring guide and plan scale-up if CPU or queue depth is trending high.
Validate backups and retention using the backup & recovery checklist.
Audit user accounts and API keys (rotate stale credentials, remove unused workers).
Capture notable changes (new connectors, job migrations) in your team runbook.

Before deploying changes

Stage jobs and verify via Run & Trace in lower environments.
Check the release notes for upgrade guidance or known issues.
Ensure alerting/metrics dashboards reflect any new jobs or channels.

Incident drills & readiness

Rehearse the escalation path for worker failures (who owns remediation?).
Test the process for draining jobs and restarting workers safely.
Validate that error budgets or SLIs are defined and monitored (pair with scaling guidance).

Resources

Monitoring runbook: /operate/monitoring
Scaling playbook: /operate/scaling
Backup & recovery: /operate/backup
Security hardening: /operate/security

Keep this page bookmarked as the starting point for day-to-day operations and link it in your incident response handbook.

Previous
Overview Next
Monitoring