Working With Data
Lyft Data assumes every event is a JSON object. Inputs that read text wrap it in the _raw field, and actions mutate that JSON in place. This guide shows how to pull structure out of raw streams, convert values into meaningful units, and emit alternate formats when a downstream system insists on CSV or key/value text.
JSON as the baseline
- Each exec input line becomes
{ "_raw": "..." }unless you setjson: true. Turn long command output into a single event withignore-linebreaks: true. - Capture stdout and stderr separately by naming the fields up front:
input: exec: command: uptime result-stdout-field: uptime_raw result-stderr-field: uptime_errors ignore-linebreaks: true- When these convenience fields exist you can feed them directly into parsing actions without touching
_raw.
Extract unstructured text
Use extract when data only becomes structured after a regular expression match. Named capture groups become fields; unnamed groups map to output-fields in order. The action is tolerant by default—events that do not match fall through unchanged.
- extract: input-field: uptime_raw pattern: 'load average: (\S+), (\S+), (\S+)' output-fields: [load1, load5, load15] remove: true # drop uptime_raw after parsing warnings: true # attach a warning if the pattern fails drop: false # keep non-matching events movingChaining convert immediately after extract turns string captures into real numbers:
- convert: conversions: - field: load1 conversion: num - field: load5 conversion: num - field: load15 conversion: numConvert values and units
convert can coerce many fields at once. Mix simple type conversions with unit-aware transforms so you do not carry around “512K” or “250ms” strings.
- convert: conversions: - field: retries conversion: num units: - field: mem_bytes from: K to: M - field: latency from: ms to: sIf a conversion crosses paths with empty values, pick a null-behaviour (keep or default) to avoid surprises.
Expand structured text
CSV and delimited tables
Turn delimited rows into JSON by pointing expand at the field and enabling CSV mode features:
- expand: input-field: uptime_raw remove: true header: true # first line is a header row delim: ',' name-type-pairs: - port:num - throughput:num- Set
gen-headers: truewhen files omit a header; the runtime emits_0,_1, … with inferred types. - Provide explicit schemas with
name-type-pairsorfield-file(onename:typeentry per line). - Use
header-field,header-field-types, andheader-field-on-changeto read or emit schema rows alongside the data so downstream jobs can rehydrate the headers later.
Key/value logs
Key/value formats become JSON with one flag:
- expand: input-field: _raw key-value-pairs: true autoconvert: true delim: ' ' key-value-delim: '=' remove: trueIf the producer repeats keys, pick a strategy with multiple: array (collect all values), first, or last. Supply name-type-pairs to enforce types on specific keys.
Multi-line records
Some tools print blocks of text where each block represents a single event. Capture them with ignore-linebreaks: true on the input or configure expand’s multiline collector:
- expand: input-field: _raw remove: true multiline-var-patterns: - summary:^Summary: (.*)$ - details:^Details:$ multiline-reset-pattern: '^--+$'The collector buffers lines until the next pattern matches, then emits a JSON object built from the captured groups.
Embedded JSON strings
When a field contains quoted JSON, set json: true and let the action merge the decoded object into the event:
- expand: input-field: payload json: true remove: trueSplitting one record into many
expand can also fan out events:
-
Delimited strings: set
event-fieldto the destination field name and provide a delimiter.- expand:input-field: hostsevent-field: hostdelim: '\n'remove: true# Emits one event per host while preserving other fields -
JSON arrays: enable
event-expand: trueand optionally list JSON Pointers inevent-skip-listto keep nested arrays intact.- expand:input-field: resultsevent-expand: trueevent-skip-list:- /metadata
Preparing non-JSON output
Hotrod’s raw action is gone; instead, use collapse or templates to stage formatted payloads before sending them to exec, HTTP, or message outputs.
-
Build CSV or key/value strings with
collapse:- collapse:output-field: payloadformat: kvdelim: ' 'key-value-delim: '='input-fields: [tenant, host, status]output:http-post:url: https://collector.example.com/ingestbody-field: payload -
Prefix structured logs (for example,
@cee:) withaddtemplates:- add:template-result-field: cee_linetemplate: |@cee: {"status":"${status}","detail":"${detail}"}output:exec:command: 'logger -t lyftdata'input-field: cee_line -
For multi-line bodies, render them into a
template-result-fieldand pointhttp-post.body-fieldorexec.input-fieldat that field so the action sends your custom text untouched.
Handy toggles and safeguards
remove: drop source fields after expansion so you do not keep both_rawand the parsed JSON.warnings/suppress-warnings: control whether parsing errors attach job attachments.document-mode(inputs) anddocument-start(actions): reset state when reading batched files or archives.droponextract: stop events cold when the pattern fails, which is useful for validation pipelines.
Mastering these primitives gives you the same reach Hotrod offered, but with Lyft Data’s structured runtime. Parse raw feeds, reshape them into clean JSON, and emit exactly what your downstream systems expect.