Skip to content

Reducing Data Volume

Reducing Data Volume

Edge workers can discard noise and compress payloads before data leaves a site. The actions below mirror the Hotrod guidance but use Lyft Data’s DSL and runtime semantics so remote jobs can keep bandwidth and licensing costs in check.

Drop events early

Use the filter action to keep only events that match fixed patterns or Lua conditions. Multiple filters can be chained—one to match fields, another to gate by a numeric threshold.

actions:
- filter:
field-pattern-pairs:
- severity: 'high'
- source: '^GAUTENG-'
- filter:
condition: speed > 1

If you only want specific keys to survive, switch the filter into schema mode so that any other fields are dropped in-place:

actions:
- filter:
schema:
- source
- destination
- sent_kilobytes_per_sec

Combine this with the remove and rename actions to strip temporary fields or shorten key names before handing events to the next hop.

Forward only changes

The stream action keeps a running value and emits deltas. Set only-changes so that identical samples disappear, and Lyft Data will also include an elapsed time field for context.

actions:
- stream:
delta: true
watch: throughput
only-changes: true
output-field: delta
elapsed-field: elapsed_ms
- filter:
condition: delta != 0

This pattern is ideal for forwarding counters or gauges that rarely change but must be monitored continuously.

Trim payload fields

After filtering, remove drops helper keys (optionally via regular expressions) and rename shortens long field names so JSON payloads shrink on the wire:

actions:
- remove:
fields: ["_raw", "debug"]
- rename:
key-value-pairs:
- source=s
- destination=d
- sent_kilobytes_per_sec=sent

Compact payloads for transport

To squeeze multi-field events into a single blob, collapse can emit CSV, key/value, or array formats. The header helpers keep schema information alongside the payload so the pipeline can inflate it later.

actions:
- collapse:
output-field: payload_csv
csv:
header-field: payload_header
header-field-types: true
header-field-on-change: true

On the receiving job—often a central collector—expand reverses the process by reading the header field and rebuilding the original event. The remove flag drops the intermediate CSV fields once reconstruction succeeds, and you can move additional enrichments to that downstream pipe so edge jobs stay lean.

actions:
- expand:
input-field: payload_csv
remove: true
csv:
header-field: payload_header
header-field-types: true

With these pieces, edge workers can minimize bandwidth and storage costs while still delivering complete records to the central collector.