Executing Commands
Executing Commands
Lyft Data exposes an exec family of building blocks so you can reuse existing shell scripts, operating system tools, and CLIs inside a job. Exec-based inputs, actions, and outputs each target a different stage of the pipeline, but they all launch commands through the host shell and stream data between Lyft Data and the child process.
When to use exec
- Reuse system utilities or legacy scripts without building a custom connector
- Gather diagnostics (uptime, disk usage) alongside streaming telemetry
- Fan results into tools that expect newline-delimited JSON or plain text
- Prototype quick integrations before you invest in a dedicated input or output
Exec input
Exec inputs run commands on a schedule or once at startup and treat the output as incoming events. Lyft Data runs the command with /bin/sh on Unix platforms or PowerShell/cmd on Windows, so multi-line command blocks and shell features are available.
Key capabilities:
- Preserve multi-line command strings with the
no_strip_linefeedsswitch - Control output framing with the
json(treat each line as JSON) andignore_line_breaks(emit the entire run as one event) flags - Schedule recurring runs via
trigger.interval, and bound execution withtimeout - Inject environment variables from a file or inline map using the
envblock
input: exec: command: | ./bin/collect-metrics \ --tenant retail-eu trigger: interval: 2m ignore-line-breaks: true env: values: API_TOKEN: YOUR_API_TOKEN timeout: 30sWhen json is true, each line is parsed as a JSON event instead of being wrapped in the _raw field. Enable ignore_line_breaks to combine multi-line output (for example, certificate dumps) into a single event for downstream parsing.
Exec action
The exec action shells out for each event that flows through the job. By default the entire event body is piped to the child process stdin; set input_field to send just one field instead. Populate result_status_field, result_stdout_field, or result_stderr_field to capture process results back into the event.
actions: - exec: command: uptime result: status-field: exit_status stdout-field: uptime_raw stderr-field: uptime_err - extract: input-field: uptime_raw pattern: "load average: ([^,]+)" output-fields: [load_avg_1m]Use the exec action to enrich events with on-box measurements, ping downstream services, or gate processing based on an external script.
Exec output
Exec outputs send processed events to a command. Lyft Data spawns the shell, writes each event (plus a newline) to process stdin, and keeps retries and batching in sync with the rest of the job.
output: exec: command: ./bin/forward-to-legacy.sh retry: count: 5 pause: 500ms batch: size: 50 wrap-as-json: trueThe optional input_field focuses the payload on a specific field before writing to the command, while retry and batch mirror the behavior of other outputs. This makes it easy to wrap jobs around existing ingestion scripts, syslog forwarders, or one-off transformations while keeping operational controls (retries, timeouts, scheduling) inside Lyft Data.
Operational considerations
- Commands run inside the worker runtime, so ensure the binary or script is present on every worker host
- Treat exec jobs like any other external dependency: add alerts around exit codes and captured stderr when using the action variant
- For long-lived outputs, prefer idempotent commands—Lyft Data restarts processes on error during retries
- Store secrets in environment files or context variables; avoid embedding credentials directly in the command string