Transform Google Analytics 4 Exports
This how-to walks through reshaping a Google Analytics 4 export into the event format used inside Lyft Data. You will ingest a daily export from object storage, explode nested arrays into individual events, enrich each event with context metadata, and deliver the results to a downstream system.
Prerequisites
- GA4 export files stored in an object store bucket (S3, GCS, Azure, or FileStore). Each file contains JSON documents with an
eventsarray. - Access credentials for that bucket (access key and secret, service account JSON, or shared key).
- A target destination such as another bucket or a worker channel that feeds subsequent jobs.
- Operator access to the Lyft Data visual editor.
1. Create the job and point it at the export bucket
- Open Jobs and create a new job named
ga4-normalised. - Choose S3 (or the object store provider you use).
- Fill in Endpoint, Bucket, and credentials.
- Set Object names to the GA4 prefix, for example
exports/ga4/daily/. Object store inputs treat these values as prefixes, soexports/ga4/daily/2025-01-01matches the entire folder hierarchy beneath that key. - Choose Mode: list and download so the job lists matching objects, filters them, and then downloads each file.
- Enable Fingerprinting (default) to skip files that were already processed. If you need to replay a day, clear the fingerprint cache or stage a new job version.
- Under Response handling, enable Ignore line breaks and set Events field to
eventsso the runtime splits the GA4 event array into individual events automatically.
2. Add actions to reshape the payload
Use the following action stack as a baseline. Adjust field names to match your internal schema.
actions: - json: input-field: data - expand-events: array-field: events - flatten: input-field: events separator: "." - rename: fields: "events.event_params.key": param_key "events.event_params.value.string_value": param_value - filter: how: expression: "events.name == 'purchase'" - convert: fields: events.event_timestamp: num - time: input-field: events.event_timestamp input-formats: - epoch_msecs output-field: '@timestamp' output-format: default_iso - add: output-fields: dataset: "{{dataset}}" environment: "{{environment}}" ga4_source_file: "${msg|message_content.object_name||unknown}"Why these actions?
jsonparses the GA4 file body once, putting the document into thedatafield.expand-eventscreates a new event for every entry in the GA4eventsarray, which keeps downstream analytics from processing nested arrays manually.flattenpromotes nested keys (such asevent_params.value.string_value) into dotted names.renameshortens verbose keys and creates friendlier field names for your analysts.filterdrops GA4 events you do not care about (in this example, everything exceptpurchase).convertturns the microsecond timestamp into a number so the next action can convert it cleanly.timeemits a canonical ISO timestamp, respecting the scheduler window and honouring offsets if you apply them later.addstamps deployment context ({{dataset}},{{environment}}) and captures the source object name via${msg|...}when the job is triggered by a message.
3. Guard the schema
Add assertions so schema drift fails fast:
- assert: condition: "exists(events.user_pseudo_id)" message: "GA4 event missing user identifier"- assert: condition: "exists(events.event_params)" message: "event_params array missing; GA4 export format changed"If you prefer to continue processing while dropping malformed records, swap the assert actions for filter expressions that stop bad events from reaching the output.
4. Choose an output
During development route events to Print so you can inspect the payloads in Run & Trace. When the results look correct, replace the output with the destination you need:
- S3 or another object store bucket for long-term storage.
- Worker channel if you plan to enrich the events further in a downstream job.
- HTTP POST to send the normalised data to a reporting API.
Remember that every job still has exactly one output. Use worker channels to fan out to multiple sinks.
5. Test and iterate
- With the actions in place, click Run & Trace. The editor sends the job definition to
/api/jobs/run, executes it once on a worker, and streams the trace back so you can inspect each step. - Verify that a GA4 export file produces the expected number of events, that the flattened field names align with your schema, and that the
@timestampmatches the original GA4 timestamp. - Adjust filters, renames, or conversions until the trace looks right.
6. Stage, deploy, and monitor
- Save and Stage the job to capture an immutable revision.
- Deploy it to a non-production worker first. Watch Operate > Job status to confirm throughput and check for assertion failures.
- Configure the trigger so the job runs on your desired cadence (for example,
interval: 1hto pick up hourly exports or a cron expression for daily runs). - Promote to production workers once the metrics look healthy, and wire the job into your CI/CD flow using the steps in CI/CD automation.
- Add monitoring and alerts via Operate monitoring so operators know when object downloads fail or assertion rates spike.
By layering declarative actions and guarding the schema with assertions, you keep GA4 migrations predictable and auditable. When the export format changes, the job fails fast during staging runs, and your team can adjust the transformation pipeline before it ever reaches production.