Overview
Lyft Data is a unified data operations platform that lets teams connect sources, transform records, and deliver results without stitching together bespoke scripts. Describe the pipeline once and the platform owns scheduling, retries, and observability.
Quick start checklist
- Confirm prerequisites – Review system requirements to make sure the target machine has enough CPU, RAM, and disk headroom.
- Download the release bundle – Grab the latest binaries from Downloads; Community Edition runs without a license, but you can review licensing upgrades if you plan a production rollout. If you just need a trial, jump straight to the Evaluation Quickstart.
- Bring up the control plane – Follow the platform-specific guide (e.g., Linux server install) to start the server; the built-in worker comes online automatically in Community Edition. Evaluators can complete the single-host setup in ~15 minutes using the quickstart.
- Ship your first job – Use Getting Started for a guided tour of the UI, then complete Running a job to validate end-to-end execution.
- Harden for production – Consult the Install & configure section for TLS, backups, and multi-worker runbooks before touching real datasets.
Each step above links to a short walkthrough so you can stay in-context without hunting across the docs.
Why teams choose Lyft Data
- Connect data fast: pull from databases, SaaS APIs, files, or object stores using built-in connectors.
- Shape pipelines safely: mix built-in actions (filters, enrichers, preprocessors, etc.) with Run & Trace so you can verify every change.
- Operate with confidence: centralized logging, metrics, and alerting keep operations predictable even as workloads grow.
Architecture at a glance
Sources ──► Lyft Data Server ──► Destinations │ │ │ │ ├─ Jobs ├─ Warehouses / lakes ├─ APIs ├─ Actions ├─ Search indexes / queues ├─ Files└─ Workers └─ Downstream servicesThe Server stores job definitions, coordinates deployments, and exposes both the UI and API. Jobs describe how data moves, and Workers run the pipelines close to the data or users. Add more Workers to scale horizontally or to place execution near external systems.
Example pipeline
name: analytics-syncinput: http-poll: url: https://api.example.com/events json: true response: status-field: http_status response-field: body trigger: interval: duration: 1hactions: - filter: condition: http_status == 200 - add: output-fields: exported_at: "${time|now_time_iso}"output: s3: bucket: lyftdata-exports path: analytics/${time|now_time_fmt %Y/%m/%d}/events.jsonlThis job polls an API hourly, filters successful responses, stamps each record, and persists to S3. Chain additional jobs through Worker channels when you need fan-out, enrichment, or multi-step processing.
Your first five-minute pipeline
- Prerequisites – Confirm the server is running and the built-in worker shows Online using the Getting Started checklist. You do not need a license while you remain on Community Edition.
- Walkthrough – Follow the Day 0 → First pipeline guide for the full download, install, and deploy flow.
- What you’ll build – Create a simple pipeline in the visual editor, inspect it with Run & Trace, then deploy it to your worker.
Want the UI-only tour? Jump to Running a job.
Preparing for production
- Observability baseline – Turn on metrics shipping and review the Troubleshooting guide so you know what “healthy” looks like before launch.
- Scaling playbooks – Plan how many workers you need once you upgrade beyond Community Edition, and decide whether they live near sources or destinations. The advanced scheduling guide covers common patterns.
- Security review – Walk through TLS, RBAC, and secret storage checklists in the Install section prior to onboarding real customers.
Where to go next
- Install & configure – Deep dives on system requirements, TLS, and deployment patterns live in the Install overview.
- Build resilient jobs – The Build overview and advanced scheduling pages walk through orchestration patterns such as fan-out and retries.
- Connect integrations – Start with the Integrations catalog for per-source quick starts.
- Operate at scale – Use the Troubleshooting guide plus upcoming monitoring, backup, and security playbooks to stay ahead of incidents.
- Work as a team – Share the Team benefits guide with data engineers, operators, and reviewers to align on workflows.
With these foundations in place, you can iterate quickly on pipelines while the platform handles orchestration, scaling, and observability.