Reducing Data Volume
Reducing Data Volume
Edge workers can discard noise and compress payloads before data leaves a site. The actions below mirror the Hotrod guidance but use Lyft Data’s DSL and runtime semantics so remote jobs can keep bandwidth and licensing costs in check.
Drop events early
Use the filter action to keep only events that match fixed patterns or Lua conditions. Multiple filters can be chained—one to match fields, another to gate by a numeric threshold.
actions: - filter: field-pattern-pairs: - severity: 'high' - source: '^GAUTENG-' - filter: condition: speed > 1If you only want specific keys to survive, switch the filter into schema mode so that any other fields are dropped in-place:
actions: - filter: schema: - source - destination - sent_kilobytes_per_secCombine this with the remove and rename actions to strip temporary fields or shorten key names before handing events to the next hop.
Forward only changes
The stream action keeps a running value and emits deltas. Set only-changes so that identical samples disappear, and Lyft Data will also include an elapsed time field for context.
actions: - stream: delta: true watch: throughput only-changes: true output-field: delta elapsed-field: elapsed_ms - filter: condition: delta != 0This pattern is ideal for forwarding counters or gauges that rarely change but must be monitored continuously.
Trim payload fields
After filtering, remove drops helper keys (optionally via regular expressions) and rename shortens long field names so JSON payloads shrink on the wire:
actions: - remove: fields: ["_raw", "debug"] - rename: key-value-pairs: - source=s - destination=d - sent_kilobytes_per_sec=sentCompact payloads for transport
If your downstream systems accept batches, the biggest wins usually come from batching and compressing at the output boundary:
- Use output
batch.wrap-as-jsonto send JSON arrays instead of many small JSON documents. - Use object-store output
preprocessors: [gzip]to compress the payload before upload.
Example: batch + gzip to S3
output: s3: bucket-name: logs-archive object-name: name: "@{job}/run-${stat|_BATCH_NUMBER}.json.gz" preprocessors: - gzip batch: mode: fixed fixed-size: 500 timeout: 1s wrap-as-json: true retry: timeout: 30s retries: 5With these primitives, edge workers can minimize bandwidth and storage costs while still delivering complete records to central collectors.