Skip to content

Executing Commands

Executing Commands

Lyft Data exposes an exec family of building blocks so you can reuse existing shell scripts, operating system tools, and CLIs inside a job. Exec-based inputs, actions, and outputs each target a different stage of the pipeline, but they all launch commands through the host shell and stream data between Lyft Data and the child process.

When to use exec

  • Reuse system utilities or legacy scripts without building a custom connector
  • Gather diagnostics (uptime, disk usage) alongside streaming telemetry
  • Fan results into tools that expect newline-delimited JSON or plain text
  • Prototype quick integrations before you invest in a dedicated input or output

Exec input

Exec inputs run commands on a schedule or once at startup and treat the output as incoming events. Lyft Data runs the command with /bin/sh on Unix platforms or PowerShell/cmd on Windows, so multi-line command blocks and shell features are available.

Key capabilities:

  • Preserve multi-line command strings with the no_strip_linefeeds switch
  • Control output framing with the json (treat each line as JSON) and ignore_line_breaks (emit the entire run as one event) flags
  • Schedule recurring runs via trigger.interval, and bound execution with timeout
  • Inject environment variables from a file or inline map using the env block
input:
exec:
command: |
./bin/collect-metrics \
--tenant retail-eu
trigger:
interval: 2m
ignore-line-breaks: true
env:
values:
API_TOKEN: YOUR_API_TOKEN
timeout: 30s

When json is true, each line is parsed as a JSON event instead of being wrapped in the _raw field. Enable ignore_line_breaks to combine multi-line output (for example, certificate dumps) into a single event for downstream parsing.

Exec action

The exec action shells out for each event that flows through the job. By default the entire event body is piped to the child process stdin; set input_field to send just one field instead. Populate result_status_field, result_stdout_field, or result_stderr_field to capture process results back into the event.

actions:
- exec:
command: uptime
result:
status-field: exit_status
stdout-field: uptime_raw
stderr-field: uptime_err
- extract:
input-field: uptime_raw
pattern: "load average: ([^,]+)"
output-fields: [load_avg_1m]

Use the exec action to enrich events with on-box measurements, ping downstream services, or gate processing based on an external script.

Exec output

Exec outputs send processed events to a command. Lyft Data spawns the shell, writes each event (plus a newline) to process stdin, and keeps retries and batching in sync with the rest of the job.

output:
exec:
command: ./bin/forward-to-legacy.sh
retry:
count: 5
pause: 500ms
batch:
size: 50
wrap-as-json: true

The optional input_field focuses the payload on a specific field before writing to the command, while retry and batch mirror the behavior of other outputs. This makes it easy to wrap jobs around existing ingestion scripts, syslog forwarders, or one-off transformations while keeping operational controls (retries, timeouts, scheduling) inside Lyft Data.

Operational considerations

  • Commands run inside the worker runtime, so ensure the binary or script is present on every worker host
  • Treat exec jobs like any other external dependency: add alerts around exit codes and captured stderr when using the action variant
  • For long-lived outputs, prefer idempotent commands—Lyft Data restarts processes on error during retries
  • Store secrets in environment files or context variables; avoid embedding credentials directly in the command string