Skip to content

Scripting

Lyft Data ships a Lua 5.3 runtime for the script action. Use it when you need to derive fields, normalize payloads, or branch on complex logic without shelling out to external tooling. Scripts run on each event the action receives, working in-place on the JSON document.

Core syntax

- script:
let:
- total: amount * tax_rate
- normalized_status: string.upper(status)
set:
- site: '{{worker}}'
merge: overwrite
condition: amount ~= nil
  • let lists field: expression pairs whose values are evaluated for every event.
  • set assigns literal values. Context expansions such as {{job}} and {{now}} are available.
  • merge controls how existing fields are handled:
    • unless-exists (default) keeps the original value if the field already exists.
    • overwrite always replaces the value.
    • error leaves the event untouched and records an attachment when a scripted field already exists.
  • condition guards the entire action. When it evaluates to false, none of the let or set expressions run.

Field names must be valid Lua identifiers (start with a letter, contain letters, numbers, or _). Nested fields use dot notation (http.status), and arrays are 1-indexed (hosts[1]). The pseudo field _E exposes the entire event for cloning or inspection.

Runtime helpers

The runtime preloads helpers before your script executes. Selected categories:

  • Core helpers: count() (per-action counter), round(x), cond(condition, a, b), condn(...) (multi-branch), array(...), map(...), len(value), json(value) (pass-through), and a NULL sentinel usable with is_null(value).
  • Randomness: rand(n) returns a random integer between 1 and n; pick_random(...) selects one argument at random. Because randomness is seeded per action execution, downstream steps behave deterministically within a run.
  • Aggregation: sum(accumulator, value, [keep_running]) maintains running totals keyed by the string in accumulator. Drop in keep_running to reset when a condition flips.
  • Time: sec_s() and sec_ms() return the current epoch time in seconds or milliseconds.
  • Network & matching: cidr(ip, "10.0.0.0/24") checks membership in an IPv4 CIDR range.
  • Hashing: md5(text), sha1(text), sha256(text), sha512(text) return lowercase hex digests.
  • Identifiers: uuid() emits a version 4 UUID.
  • Base64: encode_base64(text) and decode_base64(text) encode or decode UTF-8 strings.
  • Encryption (requires the binary to ship with the corresponding features):
    • encrypt(plaintext, key) and decrypt(blob, key) provide backwards-compatible AES-CBC wrappers that now route through the AEAD implementation.
    • encrypt_s(plaintext, key, scheme) / decrypt_s(blob, key) use AEAD (default chacha20poly1305, pass "aes256gcm" for AES).
    • decrypt_auto(blob, key) accepts either legacy or AEAD payloads.
    • encrypt_for(plaintext, recipient_pub_b64) / decrypt_with(blob, recipient_priv_b64) expose HPKE X25519 + ChaCha20-Poly1305 when built with the hpke feature.
    • encrypt_age(text, recipients_csv) / decrypt_age(blob, identities_csv) integrate with age recipients when the age feature is enabled.
  • Job metrics: scripts can query runtime counters—error_count(), warning_count(), input_event_count(), output_event_count(), run_count(), batch_number(), and related byte counters—for telemetry-aware logic.
  • State store: store_set(key, condition, value) saves a string in a per-job in-memory cache when condition is true; store_get(key, default) retrieves it. Use this for lightweight state between events.

Lua’s base libraries (math, string, table, base) are available. Sandbox safety removes require, dofile, load, and collectgarbage. Referencing a missing field raises an error by default; support can flip the runtime flag to downgrade these to null assignments during troubleshooting.

Note The older Hotrod pipelines exposed helpers such as emit() and ip2asn(). Lyft Data no longer ships those bindings—scripts work on the current event only. Use the expand-events action when you need to fan out arrays into multiple events.

Examples

Derived fields and normalization

name: normalize-orders
input:
text: |
{"amount": 125.50, "currency": "usd", "customer": "ALICE"}
actions:
- script:
let:
- total_cents: round(amount * 100)
- currency: string.upper(currency)
- customer: string.lower(customer)
- observed_at: sec_ms()
merge: overwrite
output:
write: console
# {"amount":125.5,"currency":"USD","customer":"alice","total_cents":12550,"observed_at":1720000000000}

Rolling sums with conditional resets

- script:
let:
- batch_total: sum("orders", revenue, status == 'ok')
- batch_seq: count()
condition: revenue ~= nil

When status stops equalling ok, the accumulator resets; the next matching event starts a fresh running sum.

State between events

- script:
let:
- last_status: store_get('status', 'unknown')
- status_changed: last_status ~= status
- _ignored: store_set('status', status_changed, status)

The helper returns the previous status while updating the cache only when it actually changed.

Guarded execution

- script:
condition: condn(env == 'prod', true, env == 'staging', run_count() % 10 == 0, false)
let:
- census: map('count', input_event_count(), 'warnings', warning_count())

Production runs on every event; staging only every tenth batch; everything else skips the action entirely.

Extending the environment

init.lua

If your job package includes an init.lua file, Lyft Data loads it before any script runs. Use it to declare shared functions:

-- init.lua
function every(n)
return count() % n == 0
end
function normalize_country(code)
local normalized = string.upper(code or '')
if normalized == 'UK' then
return 'GB'
end
return normalized
end
- script:
let:
- counter: count()
- should_emit: every(5)
- country: normalize_country(country)
condition: should_emit

Bundle init.lua under the job’s files: section so workers download it alongside the spec. Scripts run inside the same interpreter, so keep helper names unique to avoid collisions.

Loading additional Lua modules

Set the load attribute to import another Lua file bundled with the job:

- script:
load: lib/string_utils.lua
let:
- segments: split_path(url)
- tenant: segments[2]

The referenced file is read from the job package before the action executes. This gives you a place to stage larger helper libraries while keeping init.lua for global bootstrap code.

-- lib/string_utils.lua
function split_path(url)
local segments = {}
for segment in string.gmatch(url or '', "[^/]+") do
table.insert(segments, segment)
end
return segments
end

Load helpers like this alongside the job so every worker sees the same implementation.

Side-effect scripts with run

The run option executes a Lua expression for each event without mutating the payload. Use it for callbacks defined in init.lua or modules loaded via load:

- script:
run: >
if error_count() > 0 and run_job_errors() % 50 == 0 then
store_set('error_alert_marker', true, tostring(run_job_errors()))
return true
end
return false

run scripts can still access and modify globals, but because they bypass let/set, events flow through unchanged.

Troubleshooting tips

  • Missing fields or helpers raise runtime errors that appear in the job attachments. When debugging, operations can flip the “suppress script errors” toggle to coerce failures to null assignments.
  • Remember that Lua arrays start at 1. When you need zero-based math, subtract 1 explicitly.
  • Use the filter or assert actions when you need to drop or block events—scripts only modify the document, they do not control flow.
  • Keep cryptographic keys outside the spec; fetch them from the environment or licensing system and inject through context expansions.

With these helpers and patterns, the script action remains the workhorse for pipeline-specific business logic in Lyft Data without sacrificing determinism or sandbox safety.