Skip to content

Backup and Recovery

A resilient Lyft Data deployment needs regular backups and rehearsed recovery procedures. This runbook captures what to back up, how often to do it, and how to validate restores.

What to back up

ComponentWhy it mattersSuggested cadence
Job definitionsSource of truth for every pipelineNightly
Server configuration (server.yaml, env vars)Controls ports, auth, storage pathsWeekly or whenever changed
Worker configurationRequired to restore external workers quicklyWeekly
Staging database / metadataTracks deployments, state, and historyDaily snapshots
SSL certificates & API keysNeeded for secure communicationAligned to rotation schedule
Logs / audit trailsUseful for forensics and complianceDaily export with 30–90 day retention

Quick export commands

Terminal window
# Export all jobs to a dated directory and compress it
EXPORT_ROOT=backups/jobs-$(date +%Y%m%d)
lyftdata jobs export --dir "$EXPORT_ROOT"
tar -czf "${EXPORT_ROOT}.tar.gz" -C "$(dirname "$EXPORT_ROOT")" "$(basename "$EXPORT_ROOT")"
# Snapshot server configuration
tar -czf backups/server-config-$(date +%Y%m%d).tar.gz \
/etc/lyftdata/server.yaml \
/etc/lyftdata/env \
/etc/lyftdata/certs
# Dump built-in SQLite metadata
tar -czf backups/staging-db-$(date +%Y%m%d).tar.gz /var/lib/lyftdata/staging.db

Store backups in two places: fast local storage for quick restores and offsite object storage for disasters. Encrypt sensitive archives before upload.

Automate daily configuration backup

#!/usr/bin/env bash
set -euo pipefail
BACKUP_DIR="/var/backups/lyftdata"
DATE=$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"
EXPORT_DIR="$BACKUP_DIR/jobs-$DATE"
lyftdata jobs export --dir "$EXPORT_DIR"
tar -czf "$BACKUP_DIR/jobs-$DATE.tar.gz" -C "$BACKUP_DIR" "jobs-$DATE"
rm -rf "$EXPORT_DIR"
cp /etc/lyftdata/server.yaml "$BACKUP_DIR/server-$DATE.yaml"
cp /etc/lyftdata/env "$BACKUP_DIR/env-$DATE"
tar -czf "$BACKUP_DIR/staging-db-$DATE.tar.gz" /var/lib/lyftdata/staging.db
find "$BACKUP_DIR" -type f -mtime +30 -delete

Validate backups

Run automated checks to ensure archives are usable:

#!/usr/bin/env bash
BACKUP="$1"
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "$TEMP_DIR"' EXIT
tar -xzf "$BACKUP" -C "$TEMP_DIR"
EXPORT_DIR=$(find "$TEMP_DIR" -maxdepth 1 -type d -name 'jobs-*' -print -quit)
if [ -z "$EXPORT_DIR" ]; then
echo "could not locate exported jobs directory" >&2
exit 1
fi
# Dry-run import confirms the job definitions still load
lyftdata jobs import --dry-run --dir "$EXPORT_DIR"
# Optional: lint server.yaml with your preferred YAML tool before redeploying

Schedule validation weekly; surface failures in monitoring.

Recovery steps

  1. Restore the server:
    • Rebuild host or container
    • Copy back server.yaml, env files, certificates
    • Restore the staging database and restart the service
  2. Re-register workers:
    • Install worker binaries
    • Restore worker configuration/API keys
    • Confirm they appear via the Workers page or curl -s http://<server>:3000/api/workers | jq '.[].id'
  3. Redeploy jobs:
    • Extract the latest job archive and run lyftdata jobs import --dir <path> --update
    • Confirm job state matches expectations in the UI
  4. Validate:
    • Run canary jobs or sample pipelines
    • Watch metrics/logs for the first hour

Disaster recovery tips

  • Keep infrastructure-as-code scripts handy to recreate servers and workers in new regions.
  • Document RPO/RTO expectations (e.g., 1 hour of data loss max, four-hour recovery window).
  • Test restore procedures quarterly to ensure runbooks stay current.

See also: Monitoring Lyft Data for detection signals and the troubleshooting guide for incident triage.