Skip to content

Transform Google Analytics 4 Exports

This how-to walks through reshaping a Google Analytics 4 export into the event format used inside Lyft Data. You will ingest a daily export from object storage, explode nested arrays into individual events, enrich each event with context metadata, and deliver the results to a downstream system.

Prerequisites

  • GA4 export files stored in an object store bucket (S3, GCS, Azure, or FileStore). Each file contains JSON documents with an events array.
  • Access credentials for that bucket (access key and secret, service account JSON, or shared key).
  • A target destination such as another bucket or a worker channel that feeds subsequent jobs.
  • Operator access to the Lyft Data visual editor.

1. Create the job and point it at the export bucket

  1. Open Jobs and create a new job named ga4-normalised.
  2. Choose S3 (or the object store provider you use).
  3. Fill in Endpoint, Bucket, and credentials.
  4. Set Object names to the GA4 prefix, for example exports/ga4/daily/. Object store inputs treat these values as prefixes, so exports/ga4/daily/2025-01-01 matches the entire folder hierarchy beneath that key.
  5. Choose Mode: list and download so the job lists matching objects, filters them, and then downloads each file.
  6. Enable Fingerprinting (default) to skip files that were already processed. If you need to replay a day, clear the fingerprint cache or stage a new job version.
  7. Under Response handling, enable Ignore line breaks and set Events field to events so the runtime splits the GA4 event array into individual events automatically.

2. Add actions to reshape the payload

Use the following action stack as a baseline. Adjust field names to match your internal schema.

actions:
- json:
input-field: data
- expand-events:
array-field: events
- flatten:
input-field: events
separator: "."
- rename:
fields:
"events.event_params.key": param_key
"events.event_params.value.string_value": param_value
- filter:
how:
expression: "events.name == 'purchase'"
- convert:
fields:
events.event_timestamp: num
- time:
input-field: events.event_timestamp
input-formats:
- epoch_msecs
output-field: '@timestamp'
output-format: default_iso
- add:
output-fields:
dataset: "{{dataset}}"
environment: "{{environment}}"
ga4_source_file: "${msg|message_content.object_name||unknown}"

Why these actions?

  • json parses the GA4 file body once, putting the document into the data field.
  • expand-events creates a new event for every entry in the GA4 events array, which keeps downstream analytics from processing nested arrays manually.
  • flatten promotes nested keys (such as event_params.value.string_value) into dotted names.
  • rename shortens verbose keys and creates friendlier field names for your analysts.
  • filter drops GA4 events you do not care about (in this example, everything except purchase).
  • convert turns the microsecond timestamp into a number so the next action can convert it cleanly.
  • time emits a canonical ISO timestamp, respecting the scheduler window and honouring offsets if you apply them later.
  • add stamps deployment context ({{dataset}}, {{environment}}) and captures the source object name via ${msg|...} when the job is triggered by a message.

3. Guard the schema

Add assertions so schema drift fails fast:

- assert:
condition: "exists(events.user_pseudo_id)"
message: "GA4 event missing user identifier"
- assert:
condition: "exists(events.event_params)"
message: "event_params array missing; GA4 export format changed"

If you prefer to continue processing while dropping malformed records, swap the assert actions for filter expressions that stop bad events from reaching the output.

4. Choose an output

During development route events to Print so you can inspect the payloads in Run & Trace. When the results look correct, replace the output with the destination you need:

  • S3 or another object store bucket for long-term storage.
  • Worker channel if you plan to enrich the events further in a downstream job.
  • HTTP POST to send the normalised data to a reporting API.

Remember that every job still has exactly one output. Use worker channels to fan out to multiple sinks.

5. Test and iterate

  1. With the actions in place, click Run & Trace. The editor sends the job definition to /api/jobs/run, executes it once on a worker, and streams the trace back so you can inspect each step.
  2. Verify that a GA4 export file produces the expected number of events, that the flattened field names align with your schema, and that the @timestamp matches the original GA4 timestamp.
  3. Adjust filters, renames, or conversions until the trace looks right.

6. Stage, deploy, and monitor

  1. Save and Stage the job to capture an immutable revision.
  2. Deploy it to a non-production worker first. Watch Operate > Job status to confirm throughput and check for assertion failures.
  3. Configure the trigger so the job runs on your desired cadence (for example, interval: 1h to pick up hourly exports or a cron expression for daily runs).
  4. Promote to production workers once the metrics look healthy, and wire the job into your CI/CD flow using the steps in CI/CD automation.
  5. Add monitoring and alerts via Operate monitoring so operators know when object downloads fail or assertion rates spike.

By layering declarative actions and guarding the schema with assertions, you keep GA4 migrations predictable and auditable. When the export format changes, the job fails fast during staging runs, and your team can adjust the transformation pipeline before it ever reaches production.