Skip to content

Core Concepts

Understanding how Lyft Data models control, orchestration, and execution helps you design pipelines that scale cleanly from a local experiment to production workloads. The three pillars are the Server (control plane), Jobs (pipeline definitions), and Workers (runtime execution).

Server: control plane

The server is Lyft Data’s source of truth. It stores configuration, maintains system health, and orchestrates work across workers. Because the Server also fronts the web UI and API, it is the hub for both day-one configuration and day-two operations.

Responsibilities

  • Persist and version Job definitions and associated assets
  • Schedule work by matching Jobs to eligible Workers and enforcing concurrency policies
  • Aggregate metrics, logs, and alerts so operators have a single pane of glass

Jobs: pipeline units

A Job describes how one stream of data moves from an input through optional processing steps to a single output. Jobs can be authored as YAML, managed through the visual editor, or generated through automation.

Lifecycle

  1. Define the input (files, APIs, queues, databases, and more) and the trigger that determines when it should run
  2. Add Actions—filters, parsers, enrichers, or Lua scripts—to shape the payload
  3. Select a single output destination (warehouse, lake, search index, message queue, etc.)
  4. Stage and deploy; the Server handles scheduling, retries, and observability

Composing flows

A single Job always links one input to one output, but you can compose multi-step or branching flows by wiring Jobs together with Worker channels. Channels are in-memory pathways that let one Job’s output feed another Job’s input without external brokers.

Job: ingest-events -> Channel: ingest-feed
Job: enrich-events <- Channel: ingest-feed -> Channel: analytics-fanout
Job: load-warehouse <- Channel: analytics-fanout
Job: push-alerts <- Channel: analytics-fanout

This pattern keeps each pipeline step isolated while enabling fan-out and enrichment. For fan-in, point multiple Jobs to the same channel and downstream processor. See Build overview and advanced scheduling for orchestration patterns, back-pressure controls, and retry strategies.

Workers: execution layer

Workers are stateless runtimes that pull assignments from the Server, materialize Job pipelines, and emit telemetry. You can run Workers next to data sources for low-latency ingestion or in a central plane for simplified networking.

How Workers operate

  1. Register with the Server and advertise capabilities (connectors, capacity, and optional labels)
  2. Receive Job payloads and instantiate the runtime (inputs, actions, outputs, triggers)
  3. Execute the pipeline, streaming metrics, logs, and traces back to the Server
  4. Tear down gracefully, reporting status so the Server can reschedule or retry

Worker isolation limits blast radius: a failure in one Worker typically impacts only the Jobs it owns. Additional Workers can be added at any time for horizontal scale or to segment workloads (e.g., by region or data sensitivity).

Scaling and operational guidance

  • Capacity planning – Establish a baseline by tracking throughput, concurrency, and memory per Job. Add Workers when queue depth or execution latency grows faster than expected.
  • Observability – Forward logs and metrics from Workers to your monitoring stack. Familiarize yourself with the Troubleshooting guide and set alerts on retry spikes or sustained channel backlogs.
  • Resilience – Use staging and canary deployments before promoting configuration changes. Pair Worker channels with retries to confine failures to specific pipeline stages.
  • Security – Apply TLS, RBAC, and secret management practices outlined in the Install runbooks before attaching production datasets.

Configuration and management

Most teams iterate through the Server UI during development, then promote Jobs via source control and CI. Environment variables and configuration files govern cluster-wide behavior (licensing, telemetry sinks, connector credentials). For deeper reference, explore the Configuration docs and topic-specific guides across Install, Build, and Operate sections.

By internalizing these concepts—Server authority, Job composition, Worker execution, and channel-driven orchestration—you can assemble reliable pipelines without re-implementing scheduling, scaling, or observability from scratch.

Where to go next