Backup and Recovery
A resilient Lyft Data deployment needs regular backups and rehearsed recovery procedures. This runbook captures what to back up, how often to do it, and how to validate restores.
What to back up
| Component | Why it matters | Suggested cadence |
|---|---|---|
| Job definitions | Source of truth for every pipeline | Nightly |
Server configuration (server.yaml, env vars) | Controls ports, auth, storage paths | Weekly or whenever changed |
| Worker configuration | Required to restore external workers quickly | Weekly |
| Staging database / metadata | Tracks deployments, state, and history | Daily snapshots |
| SSL certificates & API keys | Needed for secure communication | Aligned to rotation schedule |
| Logs / audit trails | Useful for forensics and compliance | Daily export with 30–90 day retention |
Quick export commands
# Export all jobs to a dated directory and compress itEXPORT_ROOT=backups/jobs-$(date +%Y%m%d)lyftdata jobs export --dir "$EXPORT_ROOT"tar -czf "${EXPORT_ROOT}.tar.gz" -C "$(dirname "$EXPORT_ROOT")" "$(basename "$EXPORT_ROOT")"
# Snapshot server configurationtar -czf backups/server-config-$(date +%Y%m%d).tar.gz \ /etc/lyftdata/server.yaml \ /etc/lyftdata/env \ /etc/lyftdata/certs
# Dump built-in SQLite metadatatar -czf backups/staging-db-$(date +%Y%m%d).tar.gz /var/lib/lyftdata/staging.dbStore backups in two places: fast local storage for quick restores and offsite object storage for disasters. Encrypt sensitive archives before upload.
Automate daily configuration backup
#!/usr/bin/env bashset -euo pipefailBACKUP_DIR="/var/backups/lyftdata"DATE=$(date +%Y%m%d-%H%M%S)mkdir -p "$BACKUP_DIR"
EXPORT_DIR="$BACKUP_DIR/jobs-$DATE"lyftdata jobs export --dir "$EXPORT_DIR"tar -czf "$BACKUP_DIR/jobs-$DATE.tar.gz" -C "$BACKUP_DIR" "jobs-$DATE"rm -rf "$EXPORT_DIR"
cp /etc/lyftdata/server.yaml "$BACKUP_DIR/server-$DATE.yaml"cp /etc/lyftdata/env "$BACKUP_DIR/env-$DATE"tar -czf "$BACKUP_DIR/staging-db-$DATE.tar.gz" /var/lib/lyftdata/staging.dbfind "$BACKUP_DIR" -type f -mtime +30 -deleteValidate backups
Run automated checks to ensure archives are usable:
#!/usr/bin/env bashBACKUP="$1"TEMP_DIR=$(mktemp -d)trap 'rm -rf "$TEMP_DIR"' EXIT
tar -xzf "$BACKUP" -C "$TEMP_DIR"EXPORT_DIR=$(find "$TEMP_DIR" -maxdepth 1 -type d -name 'jobs-*' -print -quit)if [ -z "$EXPORT_DIR" ]; then echo "could not locate exported jobs directory" >&2 exit 1fi
# Dry-run import confirms the job definitions still loadlyftdata jobs import --dry-run --dir "$EXPORT_DIR"
# Optional: lint server.yaml with your preferred YAML tool before redeployingSchedule validation weekly; surface failures in monitoring.
Recovery steps
- Restore the server:
- Rebuild host or container
- Copy back
server.yaml, env files, certificates - Restore the staging database and restart the service
- Re-register workers:
- Install worker binaries
- Restore worker configuration/API keys
- Confirm they appear via the Workers page or
curl -s http://<server>:3000/api/workers | jq '.[].id'
- Redeploy jobs:
- Extract the latest job archive and run
lyftdata jobs import --dir <path> --update - Confirm job state matches expectations in the UI
- Extract the latest job archive and run
- Validate:
- Run canary jobs or sample pipelines
- Watch metrics/logs for the first hour
Disaster recovery tips
- Keep infrastructure-as-code scripts handy to recreate servers and workers in new regions.
- Document RPO/RTO expectations (e.g., 1 hour of data loss max, four-hour recovery window).
- Test restore procedures quarterly to ensure runbooks stay current.
See also: Monitoring Lyft Data for detection signals and the troubleshooting guide for incident triage.