Google Analytics (GA4)

GA4

Lyft Data supports importing Google Analytics 4 exports that land in Google Cloud Storage or Amazon S3.

Configure Lyft Data to read GA4 exports from Google Cloud Storage

Add the gcs input to a job. Common fields:

bucket-name – GA4 export bucket (required).
object-names – prefix pointing at the export folder (e.g., analytics_123456/events/).
mode – list-and-download to enumerate daily files.
include-regex – narrow results to GA4 export format, typically "\\.parquet$".
timestamp-mode – last-modified to process newest exports first.
fingerprinting – deduplicate files across reruns.
credentials – GA4 exports live in GCP; use a service account with roles/storage.objectViewer.

Example: ingest daily parquet exports from GCS

input:
  gcs:
    bucket-name: analytics-prod
    object-names:
      - analytics_123456/events/
    mode: list-and-download
    include-regex:
      - "\\.parquet$"
    maximum-age: 3d
    fingerprinting: true
    timestamp-mode: last-modified
    credentials:
      service-account:
        key: ${secrets.ga4_gcs_reader}
    preprocessors:
      - parquet

Configure Lyft Data to read GA4 exports from Amazon S3

If GA4 exports are mirrored to S3, use the s3 input:

bucket-name – destination bucket.
object-names – export prefix (for example ga4/events/).
mode – list-and-download.
include-regex – match .parquet or .json.gz depending on the export.
access-key / secret-key – credentials with list/get access.
preprocessors – include parquet or extension so events are decoded into JSON.

Example: ingest GA4 parquet exports from S3

input:
  s3:
    bucket-name: analytics-s3-mirror
    object-names:
      - ga4/events/
    mode: list-and-download
    include-regex:
      - "\\.parquet$"
    maximum-age: 3d
    fingerprinting: true
    timestamp-mode: last-modified
    access-key: ${secrets.ga4_s3_access_key}
    secret-key: ${secrets.ga4_s3_secret_key}
    preprocessors:
      - parquet